Study Guides (238,207)
Statistics (153)
STAT 231 (55)
Final

# Study Guide for Final Exam STAT231.pdf

8 Pages
291 Views

School
University of Waterloo
Department
Statistics
Course
STAT 231
Professor
Andrew K C Wong
Semester
Fall

Description
STAT231 Final Exam Review LTEX er: W. Kong 1 PPDAC PPDAC = Problem / Plan / Data / Analysis / Conclusion (See the ﬁnal page for a summary) Deﬁnition 1.1. The target population is the set of animals, people or things about which you wish to draw conclusions. A unit is a singleton of the target population. Deﬁnition 1.2. The sample population is a speciﬁed subset of the target population. A sample is a singleton of the sample population and a unit of the study population. Deﬁnition 1.3. A variate is a characteristic of a single unit in a target population and is usually one of the following: 1. Response variates - interest in the study 2. Explanatory variate - why responses vary from unit to unit (a) Known - variates that are know to cause the responses i. Focal - known variates that divide the target population into subsets (b) Unknown - variates that cannot be explained in the that cause responses Deﬁnition 1.4. An attribute/parameter(T.P.)/statistic(Sample) is a characteristic of a population which is usually denoted by a function of the response variate. It can have two other names, depending on the population studied. Deﬁnition 1.5. The aspect is the goal of the study and is generally one of the following: descriptive, compar- ative, causative, and predictive. Note 1. T:P: ▯ S:P: ▯ Sample Deﬁnition 1.6. Let a(x) be deﬁned as an attribute as a function of some population or sample x. We deﬁne the study error as a(T:P:) ▯ a(S:P:): Deﬁnition 1.7. Similar to above, we deﬁne the sample error as a(S:P:) ▯ a(sample): 2 Measurement Analysis The goal of measures is to explain how far our data is spread out and the relationship of data points. 1 2.1 Measurements of Spread Deﬁnition 2.1. Coeﬃcient of Variation (CV) s This measure provides a unit-less measurement of spread:CV = ▯ ▯ 100% 2.2 Measurements of Association 1. Covariance: In theory (a population), the covariance is deﬁned as Cov(X;Y ) = E((X ▯ ▯ )(Y ▯ ▯ )) Pn X Y (xi▯▯)(i ▯y▯) but in practice (in samples) it is deﬁned as s = i=1 :Note that Cov(X;Y );s 2 R and both XY n▯1 XY give us an idea of the direction of the relationship but not the magnitude. Cov (X;Y ) 2. Correlation: In theory (a population), the correlation is deﬁned as ▯XY = ▯X▯Y but in practice (in samples) it is deﬁned as r = XY : Note that ▯1 ▯ ▯ ;r ▯ 1 and both give us an idea of the XY sXsY XY XY direction of the relationship AND the magnitude. (a) An interpretation of the values is as follows: jXYj ▯ 1 =) strong relationship, jr XY j = 1 =) perfectly linear relationshipXYjj > 1 =) positive relationship, XY j < 1 =) negative relationship, jXY j ▯ 0 =) weak relationship 3. Relative-risk: From STAT230, this the probability of something happening under a condition relative to this same thing happening if the condition is note met. Formally, for two events A and B, it is deﬁned as RR = P(AjB): An interesting property is that if RR = 1 then A ? B and vice versa. P(AjB) 4. Slope: This will be covered later on. 3 Statistical Models Recall that the goal of statistics is to guess the value of a population parameter on the basis of a (or more) sample statistic. 3.1 Types of Models Goal of statistical models: explain the relationship between a parameter and a response variate. The following are the diﬀerent types of statistical models that we will be examining : 1. Discrete (Binary) Model - either the population data is within parameters or it is not. 2. Response Model - these model the response and at most use the explanatory variate implicitly as a focal explanatory variate. 3. Regression Model - these create a function that relates the response and the explanatory variate (at- tribute or parameter); note here that we assume Y i Y ji. 2 4 Estimates and Estimators Here, we only review the main ideas of estimates and estimators. 4.1 Maximum Likelihood Estimation (MLE) Algorithm Qn 1. Deﬁne L = f (y ;y ;:::;y ) = f (y ) where we call L a likelihood function. Simplify if possible. Note 1 2 n i=1 i n that f (y ;y ;:::;y ) = Q f (y ) because we are assuming random sampling, implying that y ? y , 8i 6= j. 1 2 n i i j i=1 2. Deﬁne l = ln(L). Simplify l using logarithmic laws. 3. Find @l ; @l;:::;@l, set each of the partials to zero, and solve for each ▯ i i = 1;:::;n. The solved ▯ s @▯1 @▯2 @n i are called the estimates of f and we add a hat, ▯ ,ito indicate this. 4.2 Estimators ▯ is the realization (from a sample) of a distribution of estimates. The distribution is called an estimator and is denoted by ▯. 4.3 Biases in Statistics Deﬁnition 4.1. We say that for a given estimator, ▯, of an estimate for a model is unbiased if E(▯) = ▯ ~ holds.Otherwise, we say that our estimator is biased. 5 Distribution Theory We introduce the following new distributions. ▯ If X ▯ N(0;1) then X 2 ▯ ▯ 2which we call a Chi-squared (pronounced “Kai-Squared”) distribution on 1 one degree of freedom 2 2 2 ▯ Let X ▯ ▯ m and Y ▯ ▯ .nThen X + Y ▯ ▯ n+m which is a Chi-squared on n + m degrees of freedom N ▯ Let N ▯ N(0;1), X ▯ ▯ , Xv? N. Then q ▯ t vhich we call a student’s t-distribution on v degrees v of freedom Properties of the Student’s t-Distribution ▯ This distribution is symmetric ▯ For distribution T ▯ t v when v > 30, the student’s t is almost identical to the normal distribution with mean 0 and variance 1 ▯ For v ▯ 30, T is very close to a uniform distribution with thick tails and very even, unpronounced center 3 5.1 Least Squares Method There are two ways to use this method. First, for a given model Y and parameter ▯, suppose that we get a best ﬁt^ and deﬁne ^i= jy^ ▯ i j. The least squares approach is through any of the two Pn 1. (Algebraic) Deﬁne W = ^ . Calculate and [email protected] to determine ▯. i=1 i @▯ Pn ▯ ▯ 2. (Geometric) Deﬁne W = ^i= ▯^ ^. Note that W ? spanf 1 ; x g and so^▯1 = 0 and ▯^tx = 0. Use i=1 these equations to determine ▯. 6 Intervals Here, we deviate from the order of lectures and focus on the various types of constructed intervals. However, in this section, I will only provide the formulas and not the motivation. Name Formula Properties ~ q If V ar(▯) is known, then C ▯ N(0;1) and if it is unknown, we Conﬁdence replace V ar(▯) with V ar( ▯)and C n▯q. When ▯ = 5%, which EST▯cSE = ▯▯c ^ V ar(▯) Intervals aﬀects our value of c, we are constructing a 95% conﬁdence interval. Same as above except note that p is diﬀerent from a standard Predicting EST ▯pcSE = model Y i f (▯) + ▯iin that the ﬁrst component is random (i.e. Intervals f (▯) ▯ V ar(p ) Yp= f (▯) + p ). When ▯ = 5%, which aﬀects our value of c, we are constructing a 95% conﬁdence interval. L(▯) Likelyhood Solution of R(▯) =L(▯) When computing the solution of R(▯) ▯ 0:1, this will give the Qn 95% likelyhood interval for ▯. This interval is particularly useful Intervals where L(▯;y i = f (i ;▯) i=1 for models that are not necessarily normal 7 Hypothesis Testing While our conﬁdence interval does not tell us in a yes or no way whether or not a statistical estimate is true, a hypothesis test does. Here are the steps: 1. State the hypothesis, 0 : ▯ = 0 (this is only an example), called the null hypothes1s (H is called the alternative hypothesis and states a statement contrary to the null hypothesis). ^ 2. Calculate the discrepancy (also called the test statistic), denoted byd =▯▯0 = estimate▯0 value V ar(▯) SE assuming that ▯ is unbiased and the realization of d, denoted by D, is N(0;1) if V ar(▯) is known and tn▯q otherwise. Note that d is the number of standard deviatio0s ▯ is from ▯. 3. Calculate a p▯value given by p = 2P(D > jdj). It is als
More Less

Related notes for STAT 231

OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Just a few more details

So we can recommend you notes for your school.