Lecture Mon Apr. 1 2013 Parameter vs. Statistic -parameter: the summary description of a given variable in a population (the parameter is constant, it will not change) -statistic: the summary description of a variable in a sample (variable that you use to estimate) -through statistics we try to infer the true value of a population parameter The Sampling Distribution -if many independent random samples are selected from a population and a statistic calculated in each of them the sample statistics provided by those samples will be distributed around the population parameter in a bell-shaped fashion -this bell (or normal) curve is called the sampling distribution Standard and Sampling Error -when we move into calculating the average value of this error for a whole sampling distribution we then have a measure called the standard error (s.e.) -i.e. the average difference between a group of sample statistics and the true population parameter -more simply: the standard error is the average dispersion in a sampling distribution Calculating (estimating) the s.e. from A Sample Binomial Statistic -page 195 in text -s.e. = √ (P x Q / n) -P is the population parameter we want to calculate (e.g. the proportion of sociology students who were born in Canada) -Q = 1-P -n is the number of cases in each sample -the standard error indicates how tightly sample estimates will be distributed around a population parameter -if small the sample statistics most closely resemble the population parameter -e.g. if you increase n, you will get a smaller standard error Factors Affecting the Standard Error (you always want the standard error to be as small as possible) -how can the standard error become smaller? -the standard error will increase as a function of P x Q – highest number when there is a total split between the two i.e. both P and Q = 0.5  higher heterogeneity homogeneous populations render a smaller s.e. -the standard error is also a function of the sample size n, and will decrease as th sample size gets larger -this is what the CENTRAL LIMIT THEOREM is all about S.E In The Normal Distribution -certain proportions of the sample statistics will often fall within a specified number of standard error from the population parameter -approx. 68% of sample stats will fall within 1 and -1 s.e. of the population parameter -95% of sample stats will fall within 2 and -2 s.e. of the population parameter -99% of sample stats will fall within 3 and -3 s.e. of the population parameter Level of Confidence -we can express the accuracy of a sample stat in terms of the level of confidence that the stat falls within a specified interval from the parameter -CONFIDENCE INTERVAL: a range in which we expect a sample stat to lie for a given percentage of the time (confidence level) -e.g. 95% or 68% of the time we expect a sample stat to fall between a range of numbers (the confidence interval) Calculating Confidence Intervals Example: -prof J Benhaim want to estimate the proportion of Mcgill students who voted I the past Candian ferdal election. To do so (instead of asking every mcgill student if they voted or not) she collects a random sample of 100 students. She finds that 35% of thos estudents voted in the past federal election. How would she go about getting a 95% confidence interval of Mcgill students who voted in the past federal election? ANSWER -assign your values… - n = 100 - 35% = P (a stat that estimates the parameter) -the middle part of our curve is at 35% -next calculate standard error s.e. = √ (P x Q / n) P= 0.35 Q= (1=0.35)=0.65 n=100 -plug in the values to get the answer: s.e. = 0.048 -next calculate the confidence interval 95% c.i. = P ± 2 x s.e. = 0.35 ± 2 x 0.048 = (0.254, 0.446) or between 25.4% and 44.6% Lecture Wed Apr. 3 2013 Statistics Quantitative Analysis -numerical representation and manipulation of observations for the purpose of describing and explaining the phenomena that those observations reflect Coding -for computers to work, you must translate your data into something that they can read -coding schemes are guided by theory -codebook: document that describes the location of variables within a dataset and lists the codes assigned to the attributes composing those variables -purposes: -primary guide used in the coding process -guide for locating variables and interpreting codes in the data file during analysis Data Entry -data entry specialists enter data into statistical software or excel spreadsheet -optical scan sheets -sometimes it is part of the process of data collection Data Cleaning -possible code cleaning -codes which have not been assigned to the attributes of a variable are removed -contingency cleaning -checking that only those cases that should have data entered for a particular variable do in fact have such data -once you have your data entered and cleaned you can start your data analysis Univariate Analysis -univariate analysis: the examination of the distribution of cases of only one variable -descriptive purpose -one-way frequency distributions Measures of Central Tendency -measure of the “average” or “typical” value of a variable -mode: most frequent value e.g. for mode: 7 2 16 7 4 0 6 13 4 5 12 0 1 3 9 8 45 3 0 9 Mode = 0 Interpretation: amongst these 20 students the most common number of times a student had Kraft dinner was 0 -mean: the division of the sum of all attributes of a variable by the total number of cases formula: sum of X/n e.g for mean: Below is the number of times a sample of 20 students has Kraft dinner throughout the semester 7 2 16 7 4 0 6 13 4 5 12 0 1 3 9
