Textbook Notes (363,452)
United States (204,575)
Chapter

# MBS NYU Custom Edition full semester notes

16 Pages
197 Views

School
New York University
Department
Statistics & Operations Research
Course
STAT-UB 103
Professor
Ardeshir Shahmaei
Semester
Fall

Description
Kevin Jin 1 STAT-UB 103 Cumulative Notes Note on notation:  : “for all”, : “exists”  : “or”, : “and”  : “or” for probability (set union), : “and” for probability (set intersect) Concepts:  Fundamentals Statistic Sample Data Population Data Size: Mean: ∑ ∑ Range: ∑ (∑ ) ⁄ Variance: ∑ (∑ ) ⁄ Standard deviation: √ √ o Mean, median, mode measure central tendency of data. Range, variance, standard deviation measure variability of data. o Chebyshev’s theorem: there is no consistency in amount of data within 1 standard deviation of mean, of data is within 2, of data is within 3, and at least ( ⁄ ) of data is withinstandard deviations of mean o Empirical rule: if the frequency distribution is symmetrical and mound shaped, the data is normal if 68% of the data is within 1 standard deviation of the mean, 95% is within 2, 99.7% is within 3 o Histograms illustrate frequency distribution  Optimal number of classes: such that , where is the size of data  Class width:  Label the midpoint of each class on the -axis and have no gaps between classes  Used to find skewedness, outliers, and bimodal distributions o Quartiles  To find element at fractional ⌊ ⌋  , i.e. “rounded down to the previous integer”  ⌈ ⌉, i.e. “rounded up to the next integer” ( )  ( )  First quartile: ( )⁄  Median: ( )⁄  Third quartile: ( )⁄  Interquartile range:  Lower inner fence: ( )  Upper inner fence: ( )  Illustrated with box and whisker plots Kevin Jin 2 STAT-UB 103 Cumulative Notes  Draw box with leftmost , rightmost and a vertical line at  Draw horizontal line extending from to the largest , mark with solid circle  Draw horizontal line extending from to the smallest , mark with solid circle  Any outside of is an outlier and is drawn with an x o A sample is left skewed if more points are to the right of the mean  Histogram and frequency distribution have their peak to the right of the mean  Median in box and whisker chart is closer to third quartile than first  Mean < median < mode o A sample is normal if points are equally distributed to the left and right of mean and the mean has the highest frequency  Histogram and frequency distribution have peak near the mean  Median in box and whisker chart is equidistant from first and third quartiles  Mean = median = mode  Probability plot is about linear  Empirical rule   Stem and leaf plot has an equal amount of leaves above and below mean o A sample is right skewed if more points are to the left of the mean  Histogram and frequency distribution have their peak to the left of the mean  Median in box and whisker chart is closer to first quartile than third  Mode < median < mean  Probability o Rule of complements: ( ) ( ) ( ) ( ) o Inclusion-exclusion (additive/multiplicative rule): ) ( ) ( ) ( )  When are mutually exclusive, ( ) , so ( ) ( ) ( ) o Conditional probability:( ) ( ) ( )  When are independent, ( ) ( ) , so ( ) ( ) ( ) ( ) ( )  ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )  Bayes’ rule  Random variables o Single variable  Assumptions:  ( )  ∑ ( )  ( ) ∑ ( )  ( ) ( ) ∑ ( ) ( ) ∑ ( )  ( ) Kevin Jin 3 STAT-UB 103 Cumulative Notes  Coefficient of variation – measures variation relative to mean  Lower CV = lower risk for reward = “better” investment. Typically portfolios have lower CVs than the individual stocks o Sometimes, people only look for higher expected values as better investments though no matter how high volatility is  ( ) ( )  ( ) ( )  o Joint variable probability  ( ) is shorthand for ( )  Assumptions:  ( )  ∑ ∑ ( )  ( ) ( ) ( )  ( ) ∑ ∑ ( ( ) )  Covariance – measures dependence between two variables  ( ) ( ) ( ) ( )  If are independent, ( ) ( ) ( )  ( ) ∑ ( )  Marginal probability of at ( )   Coefficient of correlation – measures strength of linear relationship between  : no linear relationship, : perfect positive relationship, : perfect negative relationship  Joint conditional probability: ( ) ( ) ( ) ( )  When are independent: o ( ) ( ) o ( ) ( ) ( )  Probability distributions o ( ) ( ) ( )  probability of success, number of trials, number of successes  Binomial distribution is a discrete distribution used to find a probability involving a success rate, i.e. proportions  ( ) ( ) ( ) or do table ( )  If evaluating ( )and , ( ) by rule of complements is faster than ( ). Likewise, if evaluating( ) and , do ( ) Kevin Jin 4 STAT-UB 103 Cumulative Notes o ( )  expected number of events per unit interval (i.e. ),number of events that occur in a unit interval  Poisson distribution is a discrete distribution used to find the probability that a certain number of events occur in a time interval  ( ) ( )  can be any nonnegative number, so problems asking ( ) must be made into ( ) by rule of complements o ( ) ( ( ))is found using the -table.  random variable, mean, : standard deviation  Normal distribution is a continuous distribution used to find the probability that if is verified to be normally distributed  ( ) ( )  and for the transformed distribution  Continuous distribution, so ( ) ( ) for all we’re concerned  -table expresses probability between 0 and when , so other expressions must be turned into form,  If is positive o ( ) ( ) o ( ) ( ) ( )  If is negative o ( ) ( ) o ( ) ( ) o If is positive,( ) ( ) ( ) o If is negative, ( ) ( ) ( )  No need to memorize this, just make a picture  Sampling distributions o Central limit theorem: when is sufficiently large, the sampling distributionwill be approximately normal, where and √  The mean of the collected sample means is the mean of the population  The standard deviation of the sample means is the standard deviation of the population divided by the square root of the sample’s size o Point estimate: Single number derived from a sample to estimate the corresponding statistic of the population, e.g. o Confidence interval/interval estimate: Uses sample data to calculate an interval to estimate the target population parameter o √ √ , standard error of mean: standard deviation of sampling distribution o , margin of error/sampling error: term added to/subtracted from ̅ when finding confidence interval Kevin Jin 5 STAT-UB 103 Cumulative Notes  Found by multiplying standard error (e.g.√ ) with half-width of confidence interval (i.e. , ) ⁄ ⁄  E.g. ⁄ , ⁄ , or ⁄ √ √ √  Hypothesis testing o Degrees of freedom: or o (null hypothesis: “status quo”) always has equals sign, (alternate hypothesis) never does o : level of significance  Defines unlikely values of sample statistic if is true o Rejection regions:  -test: Reject if observed test statistic is in these regions  If test is : population variable target o Two tail test, where each tail has area o Reject if ⁄  If test is : population variable target o Upper tail test (right tail), where tail has area o Reject if  If test is : population variable target o Lower tail test (left tail), where tail has area o Reject if  -test: use -distribution table with  (Not on exam) -test (chi-square test)  Same procedure as -test using -distribution table instead  -test  All tests are one-sided upper tail tests  Reject if o Error: actual population measurement does not match what was concluded with sample measurements  Type I ( ) error: rejected based on sample variable, when is true based on population variable  Type II ( ) error: is true based on sample variable, when rejected based on population variable o Assumptions:  Randomly selected  Sample frequency distribution is normal, e.g. when:  The problem says so!  (single sample)  (multiple samples)  ̂ ( ̂ (single proportion) Kevin Jin 6 STAT-UB 103 Cumulative Notes  ̂ ( ̂) ̂ ( ̂) (multiple proportion) o ,  Difference of mean/proportion for two samples, used in .  ( )  ( )  ( )  The difference we want to compare to or is known as , typically 0 o  Difference of ordered pair  is mean of all differences in two dependent samples, is standard deviation of all differences in two dependent samples o Hypothesis testing steps  I. Specify the population variable of interest (e.g. , , )  II. Formulate and by assigning population values of interest (e.g. )  III. Specify  IV. Draw rejection region based on . Label tail boundaries (e.g. , ⁄ )  V. Compute the observed test statistic and see if it falls within rejection regions  VI. Make decision to reject or not reject null hypothesis and restate hypothesis that was supported (e.g. reject ) o -value is probability of observing sample result if is true  Observed level of significance ( )  -value  Let be the test statistic variable and be the observed test statistic ̅ value calculated from a formula (e.g. ⁄√ )  If two tail test: ( ) ( ) ( )  If upper tail test: ( )  If lower tail test: ( )  If -value , reject Solving hypothesis test, confidence interval problems:  ”Test for one sample” o ○ , where ○ is , , or o If ( is given, or is given and can be used to )  ̅ √ √  ⁄ ̅ ̅  Observed test statistic ̅ o If  ̅ √ Kevin Jin 7 STAT-UB 103 Cumulative Notes  ⁄ ̅   Observed test statistic ̅ ̅ o (Not on exam): If ( ⁄ )  ̂ ̅ √ √  ̂ ̅ ̂ ̅  Observed test statistic (N/A)  “Find confidence interval for ” o ⁄ o [ ̅ ̅  “How large of a sample is needed for a given margin of error ( ⁄ width, or problem gives a from ̅) o ⁄ ̅ ⁄ √ ⁄ √ ( ⁄ ) ( ) o ( )  “Test for one sample” o ○ , where ○ is , , or o ̂ : probability of success o ̂: number of successes in sample, ( ̂ : number of failures in sample o If ̂ ( ̂ √ ( )  ̂  ⁄ ̂  Observed test statistic ̂ ̂ o If ̂ ( ̂  ̃ ̃( ̃)  ̂ ̃ √  ⁄ ̂ ̃  Observed test statistic (N/A) o (Not on exam): If ̂ ( ̂ ( ⁄ )  √ ̂( ̂) √ ̂  ̂ ̂  Observed test statistic (N/A)  “Find confidence interval for ” o ⁄ o If ̂ ( ̂  ̂ ̂ Kevin Jin 8 STAT-UB 103 Cumulative Notes o If ̂ ( ̂)  ̃ ̃  (Not on exam): “Test for one sample” o ○ , where ○ is , , or o (N/A) ( ) o Observed test statistic  (Not on exam): “Find confidence interval for ” o ⁄ o [ ( ) ( ) ] ⁄ ⁄  “Test for two independent samples (unpaired difference)” o ( ) ○ , where ○ is , , or o If ( is given, or is given and can be used to )  ̅ ̅ √ √  ⁄ ̅ ̅  Observed test statistic: ̅ ̅ ̅ ̅ o If ( )  ̅ ̅ √
More Less

Related notes for STAT-UB 103

OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Just a few more details

So we can recommend you notes for your school.