Study Guides (248,471)
Canada (121,570)
Economics (571)
ECO220Y1 (28)

ECO220 MidTerm 1 Exam Notes

15 Pages
Unlock Document

Jennifer Murdock

ECO220 Notes Lecture 1: Sampling Errors & Non Sampling Errors  Goal → to make inferences about population parameter from sample statistics  Probability: foundation for statistics  Statistics: descriptive and inferential o Descriptive → describes what happened (ex. Class avg) o Inferential → conclusion about data not 100% sure o Describes sample (data) using statistics o Make inferences about population and its parameters using observed data (sample)  Population = set of all items of interest (ex. All students @ uoft for evaluations)  Parameter = descriptive measure of a population (something describes population ex. What %/fraction of population)  Sample = subset of the population (ex. Small group)  Statistics = descriptive measure of a sample (ex. Of 200 in sample response __) Sampling Error, ‘white noise’, ‘sample noise’, ‘sampling variability’ = the purely random differences between a sample and the population that arises b/c the sample is a random subset of the population  As sample size gets larger the sampling error tends to get smaller  Ex. Pick 200 out of 60,000; could result in extremely different (due to change) o Not wrong b/c random sample nothing wrong with survey itself  Size of sample determines the size of sampling error o Larger samples = less sampling error Example: bag of m&m, choose using a spoon, n= # of m&m, y = % of yellow in n n y 4 0/4 = 0% 13 2/13 = 15.4%  Population: whole bag of m&m (all)  Parameter: what %/portion are yellow  Sample completed 2 times  Samples statistics → %yellow Law of large Numbers  Larger samples = smaller sampling errors o Sampling error decrease as n (sample size) increase o There is no law of small numbers (‘law’ = small samples represent population) Example: movie on demand, should rural company offer new channel?, randomly select 100ppl, ask 2 different questions  Population: customer base in rural for company  Sample: 100 customers  Sample statistics: mean = 2.3, proportion = 0.45  Population parameters not known b/c sampling errors The Types of Information  Variables = characteristic recorded about each individual or case (types of info)  Quantitative = numerical measurements of a quality or amount o Ex. a 10% decrease in prices will lead to a 20% increase in QD  Qualitative = some assessment of quality or kind o Ex. A increase in price tends to lead to a decrease in QD  Identifier variable = unique code for each product/customer Page | 1 Data  Rows of data table correspond to individual cases  People answer survey = respondents, people experimented on = subjects/participants/experimental units # of observations = sample size ‘these data are flawed’ Data = multiple Datum = 1 3 Types of Data  Interval = numerical measurements, real numbers that are quantitative/numerical (ex. How many marriages?)  Ordinal = ranking of categories (ex. How would you rank marital status?)  Nominal = un-ranked categories that are qualitative/categorical (only use names) Hierarchy of Data 1. Interval - real number -> all calculations are valid 2. Ordinal - must represent the ranked order -> calculations based on ordering process valid 3. Nominal - arbitrary numbers that represents categories -> only calculations based on frequency 3 Types of Data Sets  Cross-sectional = a snapshot of different units taken in the same time period o Ex. Annual GDP for 2010 for 20 countries (20 observations)  Time Series = track something over time o Stationary time series = without a strong trend or change in variability (then use histogram with time series) o Ex. Annual Canadian GDP from 200 until 2010 ( 10 observations)  Panel (Longitudinal) = a cross-section of units where each is followed over time o Ex. Annual GDP of 20 countries from 200 unit 2010 (200 observations) Sampling  Stratified Sampling = a sampling design in which the population is divided into several homogenous subpopulations, or strata, and random samples are then drawn from each stratum o Strata = subset of a population that are internally homogenous but may differ from one another  Systematic Sampling = a sample drawn by selecting individual systematically from a sample frame  Convenience Sampling = a sampling technique that selects individuals who are conveniently available o May not represent population  Cluster sampling = a sample design in which groups, or clusters, representative of the population are chosen at random and a census is then take of each  Multistage sampling = sampling schemes that combine several sampling methods Page | 2  Sample size determines what can be concluded from the data regardless of the size of the population  Voluntary Response Sample o Hard to define sample frame, doesn’t correspond to population o Bias toward those with strong opinions (especially negative opinions)  Simple random sample (SRS) = a sample in which each set of n individuals in the population has an equal chance of selection  Sample Frame = list of individuals from which the sample is drawn Sampling Vs. Non-sampling Errors  Sampling Error o Pure chance (random) difference between sample & population (aka ‘white noise’) o Random: no one can guess the outcome, has some underlying set of outcomes will be equally likely o It is impossible to match sample to population b/c too many characteristics to think of and match o Undercoverage = not all portions of population sampled  Non-sampling Errors o Systematic (not random) difference between sample & population o Biased estimate = statistic is systematically higher or lower than the parameter o Systematic errors in data collection:  Systematic lying (ex. Ppl over estimate income)  Poor survey instrument design (ex. Unclear) o Non-response bias:  Low response rate and non-responders are non-random (ex. Selection) o Sampling frame differs from target population o Sampling variability = the sample-to-sample differences used to used to Tells us Sample calculate Statistestimate Parameter about Population Population Parameter = a numerically valued attribute of a model for a population (hope to estimate from sample data) Biased = any systematic failure of sampling method to represent its population  Measurement error = intentional or unintentional inaccurate response to a survey question Valid Survey  Know what you want to know  Use the right sampling frame  Ask Specific rather than general question  Watch for biases o Nonresponsive bias = bias introduced to a sample when a large fraction of those sampled fails to respond o Voluntary Response Bias o Response Bias = tendency of respondents to tailor their responses to please interviewer and consequence of slanted question wording  Be careful with question phrasing Page | 3  Be careful with answer phrasing o Measurement errors o Pilot Test = a small trial run of a study to check that the method of the study are sound  Be sure you really want a representative sample Lecture 2: Tabulations, Bar/pie Charts, Histograms & Centre Describing 1 variable (with few unique values)  Bar charts = displays the distributions of a categorical variable, showing the counts for each category next to each other for easy comparison  Pie Charts = show the whole group of cases as a circle (great for ½, ¼, 1/8 comparisons)  Segmented/Stacked Bar Charts = a bar chart that treats each bar as the “whole” and divides it proportionally into segments corresponding to the % in each group  Stem & Leaf Display = like histogram but also give individual values (but require quantitative data control)  Tabulation = list all unique values in data & relative frequency (aka, frequency table, relative frequency table) o Basis of bar/pie chart o Interval, ordinal or nominal data  One variable with interval data o Histograms = a graph that shows how the data are distributed  Frequency Histogram = Bar height measures number of observations in bin  Relative Frequency Histogram = Bar height measures fraction of observation in bin relative to total number  Density Histogram = Bar area measures the fraction of observation in bin relative to total number o Classes (bins) = non-overlapping and equal sized intervals that cover range  Number of bins selected changes the appearance of the histogram  Sturges’ formula: # of bins = 1 + 3.3*log(n) o Shape of Things **Review Lecture, slide 18-21**  Histograms give overview of a variable with a picture (can make informal inferences about the shape of population)  Symmetric = split equally to left and right  Positively Skewed = long tail to the right (skewed to right)  Negatively Skewed = long tail to the left (skewed to left)  Modality = # of major peaks  Bell/Normal/Gaussian (means unimodal, symmetrical) Describing 2 variable (with few unique pairs)  Cross tabulation = measures frequency that two variables take each possible pair of values (any kind of data) (aka, contingency table or two variables) o Basis of pie/bar chart o Shows relationships between two variables o Interval, ordinal or nominal data o Creates Contingency tables  Marginal distribution = frequency distribution of either one of the variables  Conditional distributions = the distribution of a variable restricting the WHO to consider only a smaller group of individuals Sample vs. Population  Sample contains only a subset of observations in a population (sample errors too) Page | 4  Sampling noise = difference between population and sample simply due to random chance o Driven by size of the sample (and not the size of the sample relative to the size of the population, which usually assumed infinite)  Sampling error always present o Statistic is study of how to make inferences in light of sampling error o Never see in perfect forms o Consider sample size (n) when making informal inferences (larger = more accurate) Measures of Central Tendency  Sample statistics often called summary statistics b/c they are meant to give a concise idea of what data “looks like”  Three sample statistics that provide numeric measures of central tendency -No Sample Error -Have Sample Error -Population -Population Statistic Parameter -Real Life  Median = middle observation after sorting o If n is an even #, calculate median by averaging the two middle observations o Better choice for skewed data than mean  Mode = the value that occurs with the greatest frequency o With interval data often use modal class o Modal Class = class with most observations Sensitivity to Outliers  Outliers = extremely large or small values different from the bulk of the data  Robust = not sensitive to outliers o Mean not robust b/c sensitive can rise/lower (balance) o Median is robust b/c more subject to sample error, only looks at last and first **REVIEW LECTURE DIAGRAMS***** Graph Problems  (violate) Area principle = a principle that helps to interpret statistical information by insisting that in a statistical display each data value be represented by the same amount of area (ex. 3D pie chart)  Keep it honest (all % add up to 100%)  Look at data separately too when more than 1 variable or created contingency table  Use large enough sample size (especially for pie chart)  Don’t overstate case  Simpson’s Paradox = a phenomenon that arise when averages, or percentages, are taken across different groups, and these group averages appear to contradict the overall averages Lecture 3: Describing Interval data (beyond a histogram & mean/medium/mode) Page | 5 4 Measures of Variability (spread)  Summarize data variability with statistics: o Range = the difference between the largest and smallest observations  Measure of variability as difference become bigger, data more variable  Sample range subject to sample error  Use 2 observations (biggest & smallest)  Very sensitive to outliers o Variance = sum of the squared deviations from the mean divided by the degrees of freedom  Always  Numerator: total sum of squares (TSS)  Observation far from mean increases TSS a lot  Denominator: degree of freedom (df, v, ‘nu’)  Only n-1 free observation left after calculate mean o Standard Deviation = the square root of the variance  Measured in same units as original variable  Variance measured in units squared  Standard deviation all same for different shaped graphs b/c use all data like mean (all data in these graphs are centered around the middle)  S.d. depends on shape of graph and units of measure  Possible for: range = 0 & S.D. = 0  **Review Lecture, slide 6**  Empirical Rule (Normal/Bell) ~ if sample from normal population  About 68.3% of all obs. Within 1s.d. of mean  About 95.4% of all obs. Within 2s.d. of mean  About 99.7% of all obs. Within 3s.d. of mean  Chebysheff’s Theorem (always true, for any shape)  At least 100*(1-1/K )% of observations lie within K s.d.’s of the mean for k>1 2 o At least 75% of the obs. lit die within 2s.d. of mea2 (1-1/2 = ¾) o At least 89% of the obs. lie within 3s.d. of mean (1-1/3 = 8/9) o Can be applied to all samples no matter how population is distributed o Notes: k does not have to be integer (ex. 1.5), difference between this and Empirical is that K >1 and K ≠ 0 for
More Less

Related notes for ECO220Y1

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.