Class Notes
(806,696)

Canada
(492,414)

University of Guelph
(25,939)

SOAN 2120
(388)

Scott Schau
(113)

Lecture

# Lecture Apr. 1, 3, 5.docx

Unlock Document

University of Guelph

Sociology and Anthropology

SOAN 2120

Scott Schau

Winter

Description

Lecture Mon Apr. 1 2013
Parameter vs. Statistic
-parameter: the summary description of a given variable in a population (the parameter is
constant, it will not change)
-statistic: the summary description of a variable in a sample (variable that you use to
estimate)
-through statistics we try to infer the true value of a population parameter
The Sampling Distribution
-if many independent random samples are selected from a population and a statistic
calculated in each of them the sample statistics provided by those samples will be
distributed around the population parameter in a bell-shaped fashion
-this bell (or normal) curve is called the sampling distribution
Standard and Sampling Error
-when we move into calculating the average value of this error for a whole sampling
distribution we then have a measure called the standard error (s.e.)
-i.e. the average difference between a group of sample statistics and the true population
parameter
-more simply: the standard error is the average dispersion in a sampling distribution
Calculating (estimating) the s.e. from A Sample Binomial Statistic
-page 195 in text
-s.e. = √ (P x Q / n)
-P is the population parameter we want to calculate (e.g. the proportion of sociology
students who were born in Canada)
-Q = 1-P
-n is the number of cases in each sample
-the standard error indicates how tightly sample estimates will be distributed around a
population parameter
-if small the sample statistics most closely resemble the population parameter
-e.g. if you increase n, you will get a smaller standard error
Factors Affecting the Standard Error
(you always want the standard error to be as small as possible)
-how can the standard error become smaller?
-the standard error will increase as a function of P x Q – highest number when there is a
total split between the two
i.e. both P and Q = 0.5 higher heterogeneity
homogeneous populations render a smaller s.e.
-the standard error is also a function of the sample size n, and will decrease as th sample
size gets larger
-this is what the CENTRAL LIMIT THEOREM is all about S.E In The Normal Distribution
-certain proportions of the sample statistics will often fall within a specified number of
standard error from the population parameter
-approx. 68% of sample stats will fall within 1 and -1 s.e. of the population
parameter
-95% of sample stats will fall within 2 and -2 s.e. of the population parameter
-99% of sample stats will fall within 3 and -3 s.e. of the population parameter
Level of Confidence
-we can express the accuracy of a sample stat in terms of the level of confidence that the
stat falls within a specified interval from the parameter
-CONFIDENCE INTERVAL: a range in which we expect a sample stat to lie for a given
percentage of the time (confidence level)
-e.g. 95% or 68% of the time we expect a sample stat to fall between a range of
numbers (the confidence interval)
Calculating Confidence Intervals
Example:
-prof J Benhaim want to estimate the proportion of Mcgill students who voted I the past
Candian ferdal election. To do so (instead of asking every mcgill student if they voted or
not) she collects a random sample of 100 students. She finds that 35% of thos estudents
voted in the past federal election.
How would she go about getting a 95% confidence interval of Mcgill students who voted
in the past federal election?
ANSWER
-assign your values…
- n = 100
- 35% = P (a stat that estimates the parameter)
-the middle part of our curve is at 35%
-next calculate standard error
s.e. = √ (P x Q / n)
P= 0.35 Q= (1=0.35)=0.65 n=100
-plug in the values to get the answer:
s.e. = 0.048
-next calculate the confidence interval
95% c.i. = P ± 2 x s.e.
= 0.35 ± 2 x 0.048
= (0.254, 0.446) or between 25.4% and 44.6% Lecture Wed Apr. 3 2013
Statistics
Quantitative Analysis
-numerical representation and manipulation of observations for the purpose of describing
and explaining the phenomena that those observations reflect
Coding
-for computers to work, you must translate your data into something that they can read
-coding schemes are guided by theory
-codebook: document that describes the location of variables within a dataset and lists
the codes assigned to the attributes composing those variables
-purposes:
-primary guide used in the coding process
-guide for locating variables and interpreting codes in the data file during analysis
Data Entry
-data entry specialists enter data into statistical software or excel spreadsheet
-optical scan sheets
-sometimes it is part of the process of data collection
Data Cleaning
-possible code cleaning
-codes which have not been assigned to the attributes of a variable are removed
-contingency cleaning
-checking that only those cases that should have data entered for a particular
variable do in fact have such data
-once you have your data entered and cleaned you can start your data analysis
Univariate Analysis
-univariate analysis: the examination of the distribution of cases of only one variable
-descriptive purpose
-one-way frequency distributions
Measures of Central Tendency
-measure of the “average” or “typical” value of a variable
-mode: most frequent value
e.g. for mode:
7 2 16 7 4 0 6 13 4 5
12 0 1 3 9 8 45 3 0 9
Mode = 0
Interpretation: amongst these 20 students the most common number of times a student
had Kraft dinner was 0 -mean: the division of the sum of all attributes of a variable by the total number of cases
formula: sum of X/n
e.g for mean: Below is the number of times a sample of 20 students has Kraft dinner
throughout the semester
7 2 16 7 4 0 6 13 4 5
12 0 1 3 9

More
Less
Related notes for SOAN 2120