Class Notes (783,541)
Canada (480,730)
Statistics (470)
STAT 231 (83)
Lecture

W12 Course Notes.pdf

33 Pages
67 Views
Unlock Document

School
University of Waterloo
Department
Statistics
Course
STAT 231
Professor
Brad Lushman
Semester
Fall

Description
STAT 231 (Winter 2012 - 1121) Honours Statistics Prof. R. Metzger University of Waterloo LT Xer: W. Kong E Last Revision: April 3, 2012 Table of Contents 1 PPDAC 1 1.1 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . 1.2 Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 . . . . . 1.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 . . . . . 1.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 . . . . . 1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 . . . . . 2 Measurement Analysis 8 2.1 Measurements of Spread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 . . . 2.2 Measurements of Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8. . . . 3 Probability Theory 9 4 Statistical Models 10 4.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 . . . . . 4.2 Types of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. . . . . 5 Estimates and Estimators 11 5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11 . . . . . 5.2 Maximum Likelihood Estimation (MLE) Algorithm . . . . . . . . . . . . . . . . . . .12 . 5.3 Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 . . . . . 5.4 Biases in Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 . . . . . Winter 2012 TABLE OF CONTENTS 6 Distribution Theory 17 6.1 Student t-Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18. . . . . 6.2 Least Squares Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19. . . . 7 Intervals 19 7.1 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20 . . . . 7.2 Predicting Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20. . . . . 7.3 Likelyhood Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20. . . . . 8 Hypothesis Testing 20 9 Comparative Models 21 10 Experimental Design 22 11 Model Assessment 23 12 Chi-Squared Test 23 References 25 Appendix A 26 These notes are currently a work in progress, and as such may be incomplete or contain errors. [Source: lambertw.com] ii Winter 2012 LIST OF FIGURES List of Figures 1.1 Population Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . 1.2 Box Plots (from Wikipedia) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7. . i Winter 2012 ACKNOWLEDGMENTS Acknowledgments: Special thanks to Michael Baker andEhis LT X formatted notes. They were the inspiration for the structure of these notes. ii Winter 2012 ABSTRACT Abstract The purpose of these notes is to provide a guide to the second year honours statistics course. The contents of this course are designed to satisfy the following objectives: ▯ To provide students with a basic understanding of probability, the role of variation in empirical problem solving and statistical concepts (to be able to critically evaluate, understand and interpret statistical studies reported in newspapers, internet and scientific articles). ▯ To provide students with the statistical concepts and techniques necessary to carry out an empirical study to answer relevant questions in any given area of interest. The recommended prerequisites are Calculus I, II (one of Math 137,147 and one of Math 138, 148), and Probability (Stat 230). Readers should have a basic understanding of single-variable differential and integral calculus as well as a good understanding of probability theory. iii Winter 2012 1 PPDAC 1 PPDAC PPDAC is a process or recipe used to solve statistical problems. It stands for: Problem / Plan / Data / Analysis / Conclusion 1.1 Problem The problem step’s job is to clearly define the 1. Goal or Aspect of the study 2. Target Population and Units 3. Unit’s Variates 4. Attributes and Parameters Definition 1.1. The target population is the set of animals, people or things about which you wish to draw conclusions. A unit is a singleton of the target population. Definition 1.2. The sample population is a specified subset of the target population. A sample is a singleton of the sample population and a unit of the study population. Example 1.1. If I were interested in the average age of all students taking STAT 231, then: Unit = student of STAT 231 Target Population (T.P.)=students of STAT 231 Definition 1.3. A variate is a characteristic of a single unit in a target population and is usually one of the following: 1. Response variates - interest in the study 2. Explanatory variate - why responses vary from unit to unit (a) Known - variates that are know to cause the responses i. Focal - known variates that divide the target population into subsets (b) Unknown - variates that cannot be explained in the that cause responses Using Ex. 1.1. as a guide, we can think of the response variate as the age of of a student and the explanatory variates as factors such as the age of the student, when and where they were born, their familial situations, educational background and intelligence. For an example of a focal variate, we could think of something along the lines of domestic vs. international students or male vs. female. Definition 1.4. An attribute is a characteristic of a population which is usually denoted by a function of the response variate. It can have two other names, depending on the population studied: 1 Winter 2012 1 PPDAC ▯ Parameter is used when studying populations ▯ Statistic is used when studying samples ▯ Attribute can be used interchangeably with the above Definition 1.5. The aspect is the goal of the study and is generally one of the following 1. Descriptive - describing or determining the value of an attribute 2. Comparative - comparing the attribute of two (or more) groups 3. Causative - trying to determine whether a particular explanatory variate causes a response to change 4. Predictive - to predict the value of a response variate using your explanatory variate 1.2 Plan The job of the plan step is to accomplish the following: 1. Define the Study Protocol - this is the population that is actually studied and is NOT always a subset of the T.P. 2. Define the Sampling Protocol - the sampling protocol is used to draw a sample from the study population (a) Some types of the sampling protocol include i. Random sampling (ad verbatim) ii. Judgment sampling (e.g. gender ratios) iii. Volunteer sampling (ad verbatim) iv. Representative sampling A. a sample that matches the sample population (S.P.) in all important characteristics (i.e. the proportion of a certain important characteristic in the S.P. is the same in the sample) (b) Generally, statisticians prefer Random sampling 3. Define the Sample - ad verbatim 4. Define the measurement system - this defines the tools/methods used A visualization of the relationship between the populations is below. In the diagram, T.P. is the target population and S.P. is the sample population. Example 1.2. Suppose that we wanted to compare the common sense between maths and arts students at the University of Waterloo. An experiment that we could do is take 50 arts students and 33 maths students from the University of Waterloo taking an introductory statistics course this term and have them write a statistics test (non-mathematical) and average the results for each group. In this situation we have: 2 Winter 2012 1 PPDAC T.P. S.P. Sample Sample Error Study Error Figure 1.1: Population Relationships ▯ T.P. = All arts and maths students ▯ S.P. = Arts and maths students from the University of Waterloo taking an introductory statistics course this term ▯ Sample = 50 arts students and 33 maths students from the University of Waterloo taking an intro- ductory statistics course this term ▯ Aspect = Comparative study ▯ Attribute(s): Average grade of arts and maths students (for T.P., S.P., and sample) – Parameter for T.P. and S.P. – Statistic for Sample There are, however some issues: 1. Sample units differ from S.P. 2. S.P. units differ from T.P. units 3. We want to measure common sense, but the test is measuring statistical knowledge 4. Is it fair to put arts students in a maths course? Example 1.3. Suppose that we want to investigate the effect of cigarettes on the incidence of lung cancer in humans. We can do this by purchasing online mice, randomly selecting 43 mice and letting them smoke 20 cigarettes a day. We then conduct an autopsy at the time of death to check if they have lung cancer. In this situation, we have: ▯ T.P. = All people who smoke ▯ S.P. = Mice bought online ▯ Sample = 43 selected online mice ▯ Aspect = Not Comparative 3 Winter 2012 1 PPDAC ▯ Attribute: – T.P. - proportion of smoking people with lung cancer – S.P. - proportion of mice bought online with lung cancer – Sample - proportion of sample with lung cancer Like the previous example, there are some issues: 1. Mice and humans may react differently to cigarettes 2. We do not have a baseline (i.e. what does X% mean?) 3. Is have the mice smoke 20 cigarettes a day realistic? Definition 1.6. Let a(x) be defined as an attribute as a function of some population or sample x. We define the study error as a(T:P:) ▯ a(S:P:): Unfortunately, there is no way to directly calculate this error and so its value must be argued. In our previous examples, one could say that the study error in Ex. 1.3. is higher than that of Ex. 1.2. due to the S.P. in 1.3. being drastically different than the T.P.. Definition 1.7. Similar to above, we define the sample error as a(S:P:) ▯ a(sample): Although we may be able to calculate a(sample), a(S:P:) is not computable and like the above, this value must be argued. Remark 1.1. Note that if we use a random sample, we hope it is representative and that the above errors are minimized. 1.3 Data This step involves the collecting and organizing of data and statistics. Data Types ▯ Discrete Data: Simply put, there are “holes” between the numbers ▯ Continuous (CTS) Data: We assume that there are no “holes” ▯ Nominal Data: No order in the data ▯ Ordinal Data: There is some order in the data ▯ Binary Data: e.g. Success/failure, true/false, yes/no ▯ Counting Data: Used for counting the number of events 4 Winter 2012 1 PPDAC 1.4 Analysis This step involves analyzing our data set and making well-informed observations and analyses. Data Quality There are 3 factors that we look at: 1. Outliers: Data that is more extreme than their counterparts. (a) Reasons for outliers? i. Typos or data recording errors ii. Measurement errors iii. Valid outliers (b) Without having been involved in the study from the start, it is difficult to tell which is which 2. Missing Data Points 3. Measurement Issues Characteristic of a Data Set Outliers could be found in any data set but these 3 always are: 1. Shape (a) Numerical methods: Skewness and Kurtosis (not covered in this course) (b) Empirical methods: i. Bell-shaped: Symmetrical about a mean ii. Skewed left (negative): densest area on the right iii. Skewed right (positive): densest area on the left iv. Uniform: even all around; straight line 2. Center (location) (a) The “middle” of our data i. Mode: statistic that asks which value occurs the most frequently ii. Median (Q2): The middle data value A. The definition in this course is an algorithm B. We denote n data values by x ;x ;:::;x and the sorted data by x ;x ;:::;x where 1 2 n (1) (2) (n) x(1)▯ x (2)▯ ::: ▯ x (n) x ▯x (2) ( n21) If n is odd then Q2 = x (n21) and if n is even, then Q2 = 2 5 Winter 2012 1 PPDAC n P xi iii. Mean: The sample mean is x ▯ = i=1 n A. The sample mean moves in the the direction of the outlier if an outlier is added (median as well but less of an effect) (b) Robustness: The median is less affected by outliers and is thus robust. 3. Spread (variability) (a) Range: By definition, this is x (n)▯ x (1) (b) IQR (Interquartile range): The middle half of your data i. We first define posn(a) as the function which returns the index of the statistic a in a data set. The value of Q1 is defined as the median in the data set x(1);x(2):::;x(posn(Q2)▯1) and Q2 is the median of the data set x(posn(Q2)+1)x(posn(Q2)+2):::;x(n) ii. The IQR of set is defined to be the difference between Q3 and Q1. That is, IQR = Q3 ▯ Q1 iii. A box plot is visual representation of this. The whiskers of a box plot represent values that are some value below or above the data set, according to the upper and lower fences, which are the upper and lower bounds of the whiskers respectively. The lower fence, LL, is bounded by a value of LL = Q ▯ 1:5(IQR) and the upper fence, UL, a value of UL = Q + 3:5(IQR) We say that any value that is less than LL or greater than UL is an outlier. 6 Winter 2012 1 PPDAC Here, we have a visualization of a box plot, courtesy of Wikipedia: Figure 1.2: Box Plots (from Wikipedia) (c) Variance: For a sample population, the variance is defined to be n P 2 (xi▯ ▯) s = i=1 n ▯ 1 which is called the sample variance. (d) Standard Deviation: For a sample population, the standard deviation is just the square root of the variance: v u n u P 2 t (xi▯ ▯) s = i=1 n ▯ 1 which is called the sample standard deviation. 1.5 Conclusion In the conclusion, there are only two aspects of the study that you need to be concerned about: 1. Did you answer your problem 2. Talk about limitations (i.e. study errors, samples errors) 7 Winter 2012 2 MEASUREMENT ANALYSIS 2 Measurement Analysis The goal of measures is to explain how far our data is spread out and the relationship of data points. 2.1 Measurements of Spread The goal of the standard deviation is to approximate the average distance a point is from the mean. Here are some other methods that we could use for standard deviation and why they fail: Pn (i ▯) 1. i=1 does not work because it is always equal to 0 n Pn ji ▯j 2. i=1 works but we cannot do anything mathematically significant to it n s Pn 2 i=1i ▯) Pn P 3. n is a good guess, but note th(xi▯ ▯) ▯ jxi▯ ▯j i=1 i=1 (a) To fix this, we use n ▯ 1 instead of n (proof comes later on) Proposition 2.1. An interesting identity is the following: u u u Pn 2 u Pn 2 2 u (xi▯ ▯) u (xi▯ nx▯ ) t i=1 t i=1 s = n ▯ 1 = n ▯ 1 Proof. Exercise for the reader. Definition 2.1. Coefficient of Variation (CV) This measure provides a unit-less measurement of spread: s CV = ▯ 100% ▯ 2.2 Measurements of Association Here, we examine a few interesting measures which test the relationship between two random variables X any Y . 1. Covariance: In theory (a population), the covariance is defined as Cov(X;Y ) = E((X ▯ ▯ X(Y ▯ ▯ Y) 8 Winter 2012 3 PROBABILITY THEORY but in practice (in samples) it is defined as P (xi▯ x▯)(yi▯ ▯) s = i=1 : XY n ▯ 1 Note that Cov(X;Y );s 2 R and both give us an idea of the direction of the relationship but not XY the magnitude. 2. Correlation: In theory (a population), the correlation is defined as ▯ = Cov(X;Y ) XY ▯ X Y but in practice (in samples) it is defined as r = XY : XY sX Y Note that ▯1 ▯ ▯ ;r ▯ 1 and both give us an idea of the direction of the relationship AND the XY XY magnitude. (a) An interpretation of the values is as follows: i. jXYj ▯ 1 =) strong relationship ii. jrj = 1 =) perfectly linear relationship XY iii. XYj > 1 =) positive relationship iv. jXYj < 1 =) negative relationship v. jr j ▯ 0 =) weak relationship XY 3. Relative-risk: From STAT230, this the probability of something happening under a condition relative to this same thing happening if the condition is note met. Formally, for two events A and B, it is defined as P(AjB) RR = ▯ : P(AjB) An interesting property is that if RR = 1 then A ? B and vice versa. 4. Slope: This will be covered later on. 3 Probability Theory All of this content was covered in STAT230 so I will not be typesetting it. Check out any probability textbook just to review the concepts and properties of expectation and variance. The only important change was a notational one, specifically that instead of writing X ▯ Bin(n;p), we write X ▯ Bin(n;▯) where p = ▯ still. 9 Winter 2012 4 STATISTICAL MODELS 4 Statistical Models Recall that the goal of statistics is to guess the value of a population parameter on the basis of a (or more) sample statistic. 4.1 Generalities We make our measurements on our sample units. The data values that are collected are: ▯ The response variate ▯ The explanatory variate(s) The response variate is a characteristic of the unit that helps us answer the problem. It will be denoted by Y and will be assumed to be random with a random component ▯. Every model is relating the population parameter (▯;▯;▯;▯;:::) to the sample values (units). We will use at least one subscript representing the value of unit i. Note that a realization, y , is the respinse that is a achieved by a response variate Y . i Example 4.1. Here is a situation involving coin flips: Yi = flipping a coin that has yet to land yi = coin lands and realizes its potential (H/T) In every model we assume that sampling was done randomly. This allows us to assume that ▯ ? ▯ for i j i 6= j. 4.2 Types of Models Goal of statistical models: explain the relationship between a parameter and a response variate. The following are the different types of statistical models that we will be examining : 1. Discrete (Binary) Model - either the population data is within parameters or it is not. (a) Binomial: Y = ▯i, ▯ ▯iBii(1;▯) (b) Poisson: Y = ▯i, ▯ i Pois(▯) 2. Response Model - these model the response and at most use the explanatory variate implicitly as a focal explanatory variate. 2 (a) Y i ▯ + ▯ ; ▯i▯ N(i;▯ ) 2 (b) Y ij▯ + ▯i; ▯ ▯ij(0;ij) 10 Winter 2012 5 ESTIMATES AND ESTIMATORS 2 (c) Y =ij + ▯ + ▯ i ▯ ▯ ij0;▯ij (d) Y =i▯ + ▯ + ▯ i Where Y ;Y are the responses of unit j [in group i], ▯ the overall average, ▯ the average j ij i in the i th group, ▯ tie difference between the overall average and the average in the i th group, and ▯ equal to some bias. 3. Regression Model - these create a function that relates the response and the explanatory variate (attribute or parameter); note here that we assume Y = Y jX. i i (a) Y =i▯ + ▯x + ▯ ;i▯ ▯ i(0;i ) 2 (b) Y = ▯ + ▯(x ▯ x ▯) + ▯ ; ▯ ▯ N(0;▯ ) 2 i i i i (c) Y = ▯x + ▯ ; ▯ ▯ N(0;▯ ) 2 i i i i 2 (d) Y =ij + ▯xi+ ▯ ; ij▯ N(ij▯ )ij 2 (e) Y =i▯ + ▯0x + ▯1 1i ▯ ; 2 2iN(0;▯i) i Where Y ;Y j ij are the response of unit j [in group i], ▯;▯ are the i0tercepts, ▯ is the slope, xj;x ije the explanatory variates of unit j [in group i], and ▯ ,▯ the slo1es 2or two explanatory variates (indicating that Y in (e) is a function of two explanatory variates). Note that in the i above models, we assume that we can control our explanatory variate and so we treat the x s i0 constant. Theorem 4.1. Linear Combinations of Normal Random Variables Let X ▯ N(▯ ;▯ ), X ? X for all i 6= j, k 2 R. Then, i i i j i i ! Xn Xn Xn 2 2 T = k +0 k i i) T ▯ N k0+ ki▯ i k i i i=1 i=1 i=1 Proof. Do as an exercise. 5 Estimates and Estimators In this section, we continue to develop the relationship between our population and sample. 5.1 Motivation 1 Suppose that I flip a coin 5 times, with the number of heads Y ▯ Bin(5;▯ = 2). Note that ▯ is given. What’s the value of y that maximizes f (y)? Unfortunately, since y is discrete, the best that we can do is draw a histogram and look for the maximum. This is a very boring problem. The maximum likelyhood test, however, asks the question in reverse. That is, we find the optimal parameter for ▯ , given y, such that f (y) is at its maximum. From the formula of f , given by ▯ ▯ 5 y 5▯y f (y) = ▯ (1 ▯ ▯) y 11 Winter 2012 5 ESTIMATES AND ESTIMATORS we can see that if we consider f as a function ▯, it makes it a continuous function. To compute the maximum, it is just a matter of using logarithmic differentiation, ▯ ▯ 5 lnf (y) = ln + y ln▯ + (5 ▯ y)ln(1 ▯ ▯) y finding the partials, @ lnf (y) y 5 ▯ y = ▯ @▯ ▯ 1 ▯ ▯ and setting them to zero to find the maximum @ lnf (y) y 5 ▯ y = 0 =) 0 = ▯ @▯ ▯ 1 ▯ ▯ =) y = 5▯ 5 =) ▯ = y and so if we take ▯ = y, then f (y) is maximized. 5 5.2 Maximum Likelihood Estimation (MLE) Algorithm Here, we will formally describe the maximum likelihood estimation (MLE) algorithm whose main goal is to know what estimate for a study population parameter ▯ should be used, given a set of data points such that probability of the data set being chosen is at its maximum. Suppose that we posit that a certain population follows a given statistical model with unknown parameters ▯1;▯2;:::;▯m. We then draw n data points, y 1y 2:::;y nandomly (this is important) and using the data, we try to estimate the unknown parameters. The algorithm goes as follows: Qn 1. Define L = f (y ;1 ;2::;y n = f (i ) where we call L a likelihood function. Simplify if possible. i=1 Qn Note that f (1 ;y2;:::;n ) = f (i ) because we are assuming random sampling, implying that y i y j i=1 8i 6= j. 2. Define l = ln(L). Simplify l using logarithmic laws. 3. Find @l; @l;:::;@l, set each of the partials to zero, and solve for each i , i = 1;:::;n. The solved @▯1 @▯2 @n ▯is are called the estimates of f and we add a hat, ▯ i to indicate this. To illustrate the algorithm, we give two examples. Example 5.1. Suppose Y = i ; ▯i▯ Eip(▯) and ▯ is a rate. What is the optimal ▯ according to MLE? Solution. n n Q n P 1. L = ▯ exp(▯yi▯) = ▯ exp(▯ yi▯) i=1 i=1 12 Winter 2012 5 ESTIMATES AND ESTIMATORS P 2. l = ln(L) = n ln▯ ▯ yi▯ i=1 Pn 3. @l= n▯ y = n ▯ ny▯, @l= 0 =) n▯ ny▯ = 0 =) ▯ = . 1 @▯ ▯ i=1 i ▯ @▯ ▯ ▯ 1 So the maximum l, and consequently maximum L is obtained, when ▯ = . ▯ Example 5.2. Suppose Y = ▯ + ▯x + ▯ ; ▯ ▯ N(0;▯ ). Use MLE to estimate ▯ and ▯. i i i i Solution. First, take note thatiY ▯ N(▯ + ▯xi;▯ ) 1. We first simplify L as follows. Yn 1 L = p exp(▯(y i ▯ ▯ ▯x )i=2▯ )2 2▯▯ i=1 Xn 1 1 2 2 = 2▯ ▯n exp( ▯(y i ▯ ▯ ▯x )i=2▯ ) (2▯) i=1 n 1 X 2 2 = K ▯ nexp( ▯(y i ▯ ▯ ▯x )i=2▯ ) ▯ i=1 where K = 1n. (2▯) 2. By direct evaluation, X n 2 2 l = lnK ▯ n ln▯ ▯ (yi▯ ▯ ▯ ▯x i =2▯ i=1 3. Computing the partials, we get n @l X = (yi▯ ▯ ▯ ▯x i=▯ 2 @▯ i=1 and n @l X 2 = (xi)(i ▯ ▯ ▯ ▯xi)=▯ : @▯ i=1 So first solving for ▯ we get n @l X = 0 =) (yi▯ ▯^ ▯ ▯xi) = 0 @▯ i=1 =) n▯ ▯ n▯^ ▯ n▯x▯ = 0 =) ^ = y▯ ▯ B▯ (5.1) 13 Winter 2012 5 ESTIMATES AND ESTIMATORS and extending to ▯ we get @l Xn = 0 =) (i )(i ▯^ ▯ ▯x i = 0 @▯ i=1 Xn Xn =) x y ▯ n▯▯^ ▯ ▯ x2 i i i
More Less

Related notes for STAT 231

Log In


OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.

Submit