cheat sheet2.odt

2 Pages
Unlock Document

Damian Dupuy

th Date: October 24 , 2012 SAMPLING –SAMPLING DESIGNS -simple random (most basic, use some random generation method i.e. a random number table /// -systematic sampling (take a system approach and rather than randomly select people through random number tables, a system says every 10 person we will choose, i.e. a phonebook) /// -Stratified (organize sample based on organization of the population, the organization of the sample represents the organization of the population) //// In each sample - Everyone has an equal chance of being chosen //// Can also have spatial sampling designs (looking at where things happen- i.e. the concentration of lead on a site, involves applying some sort of physical grid) //// use Cartesian Coordinates (lines of latitude/longitude) // stratified random (you have your whole map which is then divided into squares, and then randomly select a set of points form those squaresensures coverage of the map –i.e. the deposits of lead) // transect (randomly select lines along a map, and you sample along the lines) Sampling Distribution ==>Sample statistics will change or vary for each random sample selected // Probability distributions for statistics are called Sampling Distributions (if you take multiple samples and calculate the mean, and plot the means and generate a distribution of those statistics, what you have is a distribution of statistics which is the difference between a sample distribution and a sampling distribution) sample distribution-distribution of actual scores, drawn from one sample /// sampling distribution- distribution of a statistics, drawn from multiple samples often ask about this difference in EXAMS A sampling distribution is the distribution of a statistic that is drawn from all possible samples of a given size n /// Can be developed for any statistics, not just the mean (but we tend to use the mean) Central Limit Theorem - Sampling Distribution will have its own mean and standard deviation // But…the mean of a sampling distribution has important properties –summarized by Central Limit Theorem // If all samples are randomly drawn, and are independent, then the mean of the sampling distribution of sample means will be the population mean mu /// If our sample is large enough, we can use it to predict our population // The frequency distribution of sample means will be normally distributed ///What this means for us is that…when the sample size is large, the sample mean is likely to be quite close to the population mean ///Alarge sample is more likely to be closer to the true population mean than a smaller sample // Theoretically the difference between a large sample and a small sample is n=30 (the minimum number of observations to carry out a test) -- Central Limit Theorem – Variability ==> Standard deviation of the sampling distribution is equal to the sample standard deviation divided by the square root of the sample size // This is called the Standard Error of the Mean //Indicates how much a typical sample mean is likely to differ from the true population mean // Measures the amount of sampling error / The larger the n the smaller the amount of sampling error (smaller sampling error indicates less variability) the larger the n, the less variability there is= the more peaked the curve is (leptokurtic) Standard Error Standard error of the mean Standard error of a proportion SEp= , Note: q=1-p Central Limit Theorem III How large is large???If we have a normal population, it doesn’t matter what our size is, it’ll still be normal // When the population is skewed, the sample size must be large (n greater than 30) before the sampling distribution will become normal ------- Sample Estimation Statistical inference is concerned about making decisions or predictions about population parameters, using samples Two ways we do this: ==>-estimation & ==> -hypothesis testing (make decisions of the parameter based on preconceived notions, draw conclusions about the literature that might to our research scenario, statements about what we think might be the case, and these hypothesis are then tested) // Estimators are calculated using information from samples // Usually expressed as a formula === Two different types ---> -Point (use the info in our sample to select a value that predicts /represents our population, very specific of the value) & ---> -Interval (put a net around our sample mean and say our population falls somewhere in between this net, in this range) common to both these types: the issue of confidence probability gives us the level of confidence Sample Estimation – Point --> Practically, several statistics exist that could be point estimates // How does the estimate behave in repeated sampling?? // targets diagram in Mendelhall text book // Two valuable characteristics of best estimator // -unbiased (the more bias the sample, the more bias in our observations, and the more bias our predictions are) // -small variance (the smaller the margin of error) // Error of Estimation // not actually clear how close it is to our population parameter // Diagram: distributions for unbiased and unbiased estimators ---> The more biased, the less likely its able to predict our population // Diagram: Comparison of estimator variability // Under the Empirical Rule, 95% of all point estimates will lie within 2 (or more exactly 1.96) deviation from the mean // If estimate is unbiased, the difference between the point estimate and the true parameter value will be less than 1.96 standard deviations, or standard errors // -can also be said as: 95% confident that your point estimate is between +/- 1.96 standard deviations from the population parameter, and 5% chance that it’s not // Can call this the 95% margin of error // Calculated as 1.96* Standard error --- 95% Margin of error --> 95% Margin of Error = 1.96 (N=50 xbar=980lbs s=105 1.96 ( = 29.10lbs) // Therefore, we can say with 95% confidence that the sample estimate of 980lbs is within +/- 29lbs of the population parameter Confidence Intervals --> Most often you don’t know how precise the single sample mean as an estimator (i.e. smaller sample sizes) // Place interval around the sample mean, and calculate the probability of the true population mean falling within this interval // Can say, with a measureable level of confidence, the interval contains the true population parameter margin of error: its definitely this confidence intervals: it falls within this net that has been created around our sample mean ..... Confidence Interval-Formula Z values associated with say, 90% confidence level are +/- 1.65 // 90% Confidence Interval is +/-1.65 // Therefore, upper band of interval is + 1.65 // The lower band is -1.65..... Confidence Intervals II => What does it mean if you are 95% confident // If you constructed 20 intervals, each with a different sample information, 19 out of 20 would contain the population parameter mu, and 1 would not. // But, can never sure whether a particular interval contains mu (i.e. our sample could be one of the 5%) , but our level of confidence comes from repeating the process // 95% chance that our sample is representative of the population. 1.96, 1.65, and 2.33 are important z scores ( CHECKABOVE ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Choosing Correct Sample Size ==> The only way we could have it all – high level of confidence and high level of precision is to increase the number of sample size – it gets steaper , the small distribution around the mean// therefore, the smaller the number of error . // Therefore we could be more precise in our // How prices/close // accuracy means --> does our sample actually reflect the // how close our smaple really easy to our population sample. ??? not sure /// Total amount of information is due to: --> Sampling design used // Sample size n // the bigger the sample – the more information about the population it will tell // But, how many observations should be included in the sample // Greater or equal to 30 – the more the better // Have to consider the relationship between the width of the interval and the level of confidence // the width increases and we have to be concerned about it // increasing of interval width, increases confidence // but decreases precision // Only way to increase confidence without increasing the width, is to increase sample size // to have a small interval and have a high confidence, increase the SAMPLE*** Chosing Correct Sample Size II =>Taking a sample larger than necessary wastes time and effort (very costly) // Factors to consider are: // Type of sample i.e. Random, Stratified etc. // Population parameter being estimated // is it the mean or std deviation ?? // Degree of Precision (width of confidence interval) // Level of Confidence // what level of confidence do i require and what level i will be testing // the only way to increase to confidence and decrese the width is to increase sample // At a particular confidence level, increasing the sample size provides greater precision, and narrows the confidence interval ... Choosing Correct Sample Size III (CHECK BOOK) Za/2 S Za/2 /pq n = ( E )2 Where E is the amount of sampling error the researcher is wiling to tolerate // For Data measured in proportions, the formula is .. .n= ( E )2 Estimation – Small Samples -- > When n is small i.e. N<30 // Central limit theorem does not hold // May not have a normal distribution // Must use the 't' distribution, instead of the 'z' distribution // Mound shaped and symmetric around t=0 // More variable than 'z' // shape depends on 'n' // Use degrees of freedom or 'n-1' // sample size minus 1 – MAKE SURE – so if it is 25 – you have to go back to 24 // even if you are given for example – 1.645 – yoo go down to 1.699 // When we cant use the z – table we use the 't' Table :) ..... HYPOTHESIS TESTING: ---> Two ways to estimate your population // go through the estimation // point and interval // point- one value and you can over or under shoot the mark or miss it totally .// interval – put it around the mean and at the measured level of confidence our population falls in the following group, we might be wrong or right ... Hypothesis Testing --> To determine the level of precision of estimates, a confidence interval is placed around the sample // that our sample statistics taht is driven by a particular confidence, is at this range somewhere, so on // This is one form of 'inference' ..... CanAlso say. ==>Aproperly created sample is essential for the successful application of inferential statistics // is this sample drawn from population – asking such questions // georgraphers often want to know if a sample is a representa
More Less

Related notes for GGR270H1

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.