Study Guides (248,471)
Canada (121,570)
Economics (571)
ECO220Y1 (28)
Midterm

ECO220 MidTerm 1 Exam Notes

15 Pages
470 Views
Unlock Document

Department
Economics
Course
ECO220Y1
Professor
Jennifer Murdock
Semester
Fall

Description
ECO220 Notes Lecture 1: Sampling Errors & Non Sampling Errors  Goal → to make inferences about population parameter from sample statistics  Probability: foundation for statistics  Statistics: descriptive and inferential o Descriptive → describes what happened (ex. Class avg) o Inferential → conclusion about data not 100% sure o Describes sample (data) using statistics o Make inferences about population and its parameters using observed data (sample)  Population = set of all items of interest (ex. All students @ uoft for evaluations)  Parameter = descriptive measure of a population (something describes population ex. What %/fraction of population)  Sample = subset of the population (ex. Small group)  Statistics = descriptive measure of a sample (ex. Of 200 in sample response __) Sampling Error, ‘white noise’, ‘sample noise’, ‘sampling variability’ = the purely random differences between a sample and the population that arises b/c the sample is a random subset of the population  As sample size gets larger the sampling error tends to get smaller  Ex. Pick 200 out of 60,000; could result in extremely different (due to change) o Not wrong b/c random sample nothing wrong with survey itself  Size of sample determines the size of sampling error o Larger samples = less sampling error Example: bag of m&m, choose using a spoon, n= # of m&m, y = % of yellow in n n y 4 0/4 = 0% 13 2/13 = 15.4%  Population: whole bag of m&m (all)  Parameter: what %/portion are yellow  Sample completed 2 times  Samples statistics → %yellow Law of large Numbers  Larger samples = smaller sampling errors o Sampling error decrease as n (sample size) increase o There is no law of small numbers (‘law’ = small samples represent population) Example: movie on demand, should rural company offer new channel?, randomly select 100ppl, ask 2 different questions  Population: customer base in rural for company  Sample: 100 customers  Sample statistics: mean = 2.3, proportion = 0.45  Population parameters not known b/c sampling errors The Types of Information  Variables = characteristic recorded about each individual or case (types of info)  Quantitative = numerical measurements of a quality or amount o Ex. a 10% decrease in prices will lead to a 20% increase in QD  Qualitative = some assessment of quality or kind o Ex. A increase in price tends to lead to a decrease in QD  Identifier variable = unique code for each product/customer Page | 1 Data  Rows of data table correspond to individual cases  People answer survey = respondents, people experimented on = subjects/participants/experimental units # of observations = sample size ‘these data are flawed’ Data = multiple Datum = 1 3 Types of Data  Interval = numerical measurements, real numbers that are quantitative/numerical (ex. How many marriages?)  Ordinal = ranking of categories (ex. How would you rank marital status?)  Nominal = un-ranked categories that are qualitative/categorical (only use names) Hierarchy of Data 1. Interval - real number -> all calculations are valid 2. Ordinal - must represent the ranked order -> calculations based on ordering process valid 3. Nominal - arbitrary numbers that represents categories -> only calculations based on frequency 3 Types of Data Sets  Cross-sectional = a snapshot of different units taken in the same time period o Ex. Annual GDP for 2010 for 20 countries (20 observations)  Time Series = track something over time o Stationary time series = without a strong trend or change in variability (then use histogram with time series) o Ex. Annual Canadian GDP from 200 until 2010 ( 10 observations)  Panel (Longitudinal) = a cross-section of units where each is followed over time o Ex. Annual GDP of 20 countries from 200 unit 2010 (200 observations) Sampling  Stratified Sampling = a sampling design in which the population is divided into several homogenous subpopulations, or strata, and random samples are then drawn from each stratum o Strata = subset of a population that are internally homogenous but may differ from one another  Systematic Sampling = a sample drawn by selecting individual systematically from a sample frame  Convenience Sampling = a sampling technique that selects individuals who are conveniently available o May not represent population  Cluster sampling = a sample design in which groups, or clusters, representative of the population are chosen at random and a census is then take of each  Multistage sampling = sampling schemes that combine several sampling methods Page | 2  Sample size determines what can be concluded from the data regardless of the size of the population  Voluntary Response Sample o Hard to define sample frame, doesn’t correspond to population o Bias toward those with strong opinions (especially negative opinions)  Simple random sample (SRS) = a sample in which each set of n individuals in the population has an equal chance of selection  Sample Frame = list of individuals from which the sample is drawn Sampling Vs. Non-sampling Errors  Sampling Error o Pure chance (random) difference between sample & population (aka ‘white noise’) o Random: no one can guess the outcome, has some underlying set of outcomes will be equally likely o It is impossible to match sample to population b/c too many characteristics to think of and match o Undercoverage = not all portions of population sampled  Non-sampling Errors o Systematic (not random) difference between sample & population o Biased estimate = statistic is systematically higher or lower than the parameter o Systematic errors in data collection:  Systematic lying (ex. Ppl over estimate income)  Poor survey instrument design (ex. Unclear) o Non-response bias:  Low response rate and non-responders are non-random (ex. Selection) o Sampling frame differs from target population o Sampling variability = the sample-to-sample differences used to used to Tells us Sample calculate Statistestimate Parameter about Population Population Parameter = a numerically valued attribute of a model for a population (hope to estimate from sample data) Biased = any systematic failure of sampling method to represent its population  Measurement error = intentional or unintentional inaccurate response to a survey question Valid Survey  Know what you want to know  Use the right sampling frame  Ask Specific rather than general question  Watch for biases o Nonresponsive bias = bias introduced to a sample when a large fraction of those sampled fails to respond o Voluntary Response Bias o Response Bias = tendency of respondents to tailor their responses to please interviewer and consequence of slanted question wording  Be careful with question phrasing Page | 3  Be careful with answer phrasing o Measurement errors o Pilot Test = a small trial run of a study to check that the method of the study are sound  Be sure you really want a representative sample Lecture 2: Tabulations, Bar/pie Charts, Histograms & Centre Describing 1 variable (with few unique values)  Bar charts = displays the distributions of a categorical variable, showing the counts for each category next to each other for easy comparison  Pie Charts = show the whole group of cases as a circle (great for ½, ¼, 1/8 comparisons)  Segmented/Stacked Bar Charts = a bar chart that treats each bar as the “whole” and divides it proportionally into segments corresponding to the % in each group  Stem & Leaf Display = like histogram but also give individual values (but require quantitative data control)  Tabulation = list all unique values in data & relative frequency (aka, frequency table, relative frequency table) o Basis of bar/pie chart o Interval, ordinal or nominal data  One variable with interval data o Histograms = a graph that shows how the data are distributed  Frequency Histogram = Bar height measures number of observations in bin  Relative Frequency Histogram = Bar height measures fraction of observation in bin relative to total number  Density Histogram = Bar area measures the fraction of observation in bin relative to total number o Classes (bins) = non-overlapping and equal sized intervals that cover range  Number of bins selected changes the appearance of the histogram  Sturges’ formula: # of bins = 1 + 3.3*log(n) o Shape of Things **Review Lecture, slide 18-21**  Histograms give overview of a variable with a picture (can make informal inferences about the shape of population)  Symmetric = split equally to left and right  Positively Skewed = long tail to the right (skewed to right)  Negatively Skewed = long tail to the left (skewed to left)  Modality = # of major peaks  Bell/Normal/Gaussian (means unimodal, symmetrical) Describing 2 variable (with few unique pairs)  Cross tabulation = measures frequency that two variables take each possible pair of values (any kind of data) (aka, contingency table or two variables) o Basis of pie/bar chart o Shows relationships between two variables o Interval, ordinal or nominal data o Creates Contingency tables  Marginal distribution = frequency distribution of either one of the variables  Conditional distributions = the distribution of a variable restricting the WHO to consider only a smaller group of individuals Sample vs. Population  Sample contains only a subset of observations in a population (sample errors too) Page | 4  Sampling noise = difference between population and sample simply due to random chance o Driven by size of the sample (and not the size of the sample relative to the size of the population, which usually assumed infinite)  Sampling error always present o Statistic is study of how to make inferences in light of sampling error o Never see in perfect forms o Consider sample size (n) when making informal inferences (larger = more accurate) Measures of Central Tendency  Sample statistics often called summary statistics b/c they are meant to give a concise idea of what data “looks like”  Three sample statistics that provide numeric measures of central tendency -No Sample Error -Have Sample Error -Population -Population Statistic Parameter -Real Life  Median = middle observation after sorting o If n is an even #, calculate median by averaging the two middle observations o Better choice for skewed data than mean  Mode = the value that occurs with the greatest frequency o With interval data often use modal class o Modal Class = class with most observations Sensitivity to Outliers  Outliers = extremely large or small values different from the bulk of the data  Robust = not sensitive to outliers o Mean not robust b/c sensitive can rise/lower (balance) o Median is robust b/c more subject to sample error, only looks at last and first **REVIEW LECTURE DIAGRAMS***** Graph Problems  (violate) Area principle = a principle that helps to interpret statistical information by insisting that in a statistical display each data value be represented by the same amount of area (ex. 3D pie chart)  Keep it honest (all % add up to 100%)  Look at data separately too when more than 1 variable or created contingency table  Use large enough sample size (especially for pie chart)  Don’t overstate case  Simpson’s Paradox = a phenomenon that arise when averages, or percentages, are taken across different groups, and these group averages appear to contradict the overall averages Lecture 3: Describing Interval data (beyond a histogram & mean/medium/mode) Page | 5 4 Measures of Variability (spread)  Summarize data variability with statistics: o Range = the difference between the largest and smallest observations  Measure of variability as difference become bigger, data more variable  Sample range subject to sample error  Use 2 observations (biggest & smallest)  Very sensitive to outliers o Variance = sum of the squared deviations from the mean divided by the degrees of freedom  Always  Numerator: total sum of squares (TSS)  Observation far from mean increases TSS a lot  Denominator: degree of freedom (df, v, ‘nu’)  Only n-1 free observation left after calculate mean o Standard Deviation = the square root of the variance  Measured in same units as original variable  Variance measured in units squared  Standard deviation all same for different shaped graphs b/c use all data like mean (all data in these graphs are centered around the middle)  S.d. depends on shape of graph and units of measure  Possible for: range = 0 & S.D. = 0  **Review Lecture, slide 6**  Empirical Rule (Normal/Bell) ~ if sample from normal population  About 68.3% of all obs. Within 1s.d. of mean  About 95.4% of all obs. Within 2s.d. of mean  About 99.7% of all obs. Within 3s.d. of mean  Chebysheff’s Theorem (always true, for any shape)  At least 100*(1-1/K )% of observations lie within K s.d.’s of the mean for k>1 2 o At least 75% of the obs. lit die within 2s.d. of mea2 (1-1/2 = ¾) o At least 89% of the obs. lie within 3s.d. of mean (1-1/3 = 8/9) o Can be applied to all samples no matter how population is distributed o Notes: k does not have to be integer (ex. 1.5), difference between this and Empirical is that K >1 and K ≠ 0 for
More Less

Related notes for ECO220Y1

Log In


OR

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit