# GGR270H1 Study Guide - Sample Size Determination, Statistical Parameter, Systematic Sampling

by OC4255

This

**preview**shows half of the first page. to view the full**2 pages of the document.**Date: October 24th, 2012

SAMPLING –SAMPLING DESIGNS

-simple random (most basic, use some random generation method i.e. a random number table /// -systematic sampling (take a system approach and rather than randomly select people through random

number tables, a system says every 10th person we will choose, i.e. a phonebook) /// -Stratified (organize sample based on organization of the population, the organization of the sample represents the

organization of the population) //// In each sample - Everyone has an equal chance of being chosen //// Can also have spatial sampling designs (looking at where things happen- i.e. the concentration of lead

on a site, involves applying some sort of physical grid) //// use Cartesian Coordinates (lines of latitude/longitude) // stratified random (you have your whole map which is then divided into squares, and

then randomly select a set of points form those squaresensures coverage of the map –i.e. the deposits of lead) // transect (randomly select lines along a map, and you sample along the lines)

Sampling Distribution ==>Sample statistics will change or vary for each random sample selected // Probability distributions for statistics are called Sampling Distributions (if you take multiple samples

and calculate the mean, and plot the means and generate a distribution of those statistics, what you have is a distribution of statistics which is the difference between a sample distribution and a sampling

distribution) sample distribution-distribution of actual scores, drawn from one sample /// sampling distribution- distribution of a statistics, drawn from multiple samples often ask about this difference

in EXAMS A sampling distribution is the distribution of a statistic that is drawn from all possible samples of a given size n /// Can be developed for any statistics, not just the mean (but we tend to use the

mean) Central Limit Theorem - Sampling Distribution will have its own mean and standard deviation // But…the mean of a sampling distribution has important properties –summarized by Central Limit

Theorem // If all samples are randomly drawn, and are independent, then the mean of the sampling distribution of sample means will be the population mean mu /// If our sample is large enough, we can use

it to predict our population // The frequency distribution of sample means will be normally distributed ///What this means for us is that…when the sample size is large, the sample mean is likely to be quite

close to the population mean ///A large sample is more likely to be closer to the true population mean than a smaller sample // Theoretically the difference between a large sample and a small sample is n=30

(the minimum number of observations to carry out a test) -- Central Limit Theorem – Variability ==> Standard deviation of the sampling distribution is equal to the sample standard deviation divided by

the square root of the sample size // This is called the Standard Error of the Mean //Indicates how much a typical sample mean is likely to differ from the true population mean // Measures the amount of

sampling error / The larger the n the smaller the amount of sampling error (smaller sampling error indicates less variability) the larger the n, the less variability there is= the more peaked the curve is

(leptokurtic) Standard Error

Standard error of the mean Standard error of a proportion SEp= , Note: q=1-p

Central Limit Theorem III How large is large???If we have a normal population, it doesn’t matter what our size is, it’ll still be normal // When the population is skewed, the sample size must be large (n

greater than 30) before the sampling distribution will become normal ------- Sample Estimation Statistical inference is concerned about making decisions or predictions about population parameters, using

samples Two ways we do this: ==>-estimation & ==> -hypothesis testing (make decisions of the parameter based on preconceived notions, draw conclusions about the literature that might to our research

scenario, statements about what we think might be the case, and these hypothesis are then tested) // Estimators are calculated using information from samples // Usually expressed as a formula === Two

different types ---> -Point (use the info in our sample to select a value that predicts /represents our population, very specific of the value) & ---> -Interval (put a net around our sample mean and say our

population falls somewhere in between this net, in this range) common to both these types: the issue of confidence probability gives us the level of confidence

Sample Estimation – Point --> Practically, several statistics exist that could be point estimates // How does the estimate behave in repeated sampling?? // targets diagram in Mendelhall text book // Two

valuable characteristics of best estimator // -unbiased (the more bias the sample, the more bias in our observations, and the more bias our predictions are) // -small variance (the smaller the margin of error)

// Error of Estimation // not actually clear how close it is to our population parameter // Diagram: distributions for unbiased and unbiased estimators ---> The more biased, the less likely its able to predict our

population // Diagram: Comparison of estimator variability // Under the Empirical Rule, 95% of all point estimates will lie within 2 (or more exactly 1.96) deviation from the mean // If estimate is unbiased,

the difference between the point estimate and the true parameter value will be less than 1.96 standard deviations, or standard errors // -can also be said as: 95% confident that your point estimate is between

+/- 1.96 standard deviations from the population parameter, and 5% chance that it’s not // Can call this the 95% margin of error // Calculated as 1.96* Standard error --- 95% Margin of error --> 95%

Margin of Error = 1.96 (N=50 xbar=980lbs s=105 1.96 ( = 29.10lbs) // Therefore, we can say with 95% confidence that the sample estimate of 980lbs is within +/- 29lbs of the population parameter

Con fidence Intervals --> Most often you don’t know how precise the single sample mean as an estimator (i.e. smaller sample sizes) // Place interval around the sample mean, and calculate the probability of

the true population mean falling within this interval // Can say, with a measureable level of confidence, the interval contains the true population parameter margin of error: its definitely this confidence

intervals: it falls within this net that has been created around our sample mean ..... Confidence Interval-Formula Z values associated with say, 90% confidence level are +/- 1.65 // 90% Confidence

Interval is +/-1.65 // Therefore, upper band of interval is + 1.65 // The lower band is -1.65..... Confidence Intervals II => What does it mean if you are 95% confident // If you constructed 20 intervals,

each with a different sample information, 19 out of 20 would contain the population parameter mu, and 1 would not. // But, can never sure whether a particular interval contains mu (i.e. our sample could be

one of the 5%) , but our level of confidence comes from repeating the process // 95% chance that our sample is representative of the population. 1.96, 1.65, and 2.33 are important z scores ( CHECK ABOVE

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Choosing Correct Sample Size ==> The only way we could have it all – high level of confidence and high level of precision is to increase the number of sample size – it gets steaper , the small distribution

around the mean// therefore, the smaller the number of error . // Therefore we could be more precise in our // How prices/close // accuracy means --> does our sample actually reflect the // how close our

smaple really easy to our population sample. ??? not sure /// Total amount of information is due to: --> Sampling design used // Sample size n // the bigger the sample – the more information about the

population it will tell // But, how many observations should be included in the sample // Greater or equal to 30 – the more the better // Have to consider the relationship between the width of the interval and

the level of confidence // the width increases and we have to be concerned about it // increasing of interval width, increases confidence // but decreases precision // Only way to increase confidence without

increasing the width, is to increase sample size // to have a small interval and have a high confidence, increase the SAMPLE*** Chosing Correct Sample Size II =>Taking a sample larger than necessary

wastes time and effort (very costly) // Factors to consider are: // Type of sample i.e. Random, Stratified etc. // Population parameter being estimated // is it the mean or std deviation ?? // Degree of Precision

(width of confidence interval) // Level of Confidence // what level of confidence do i require and what level i will be testing // the only way to increase to confidence and decrese the width is to increase

sample // At a particular confidence level, increasing the sample size provides greater precision, and narrows the confidence interval ... Choosing Correct Sample Size III (CHECK BOOK)

Za/2 S Za/2 /pq

n = ( E )2 Where E is the amount of sampling error the researcher is wiling to tolerate // For Data measured in proportions, the formula is .. .n= ( E )2

Estimation – Small Samples -- > When n is small i.e. N<30 // Central limit theorem does not hold // May not have a normal distribution // Must use the 't' distribution, instead of the 'z' distribution //

Mound shaped and symmetric around t=0 // More variable than 'z' // shape depends on 'n' // Use degrees of freedom or 'n-1' // sample size minus 1 – MAKE SURE – so if it is 25 – you have to go back to 24 //

even if you are given for example – 1.645 – yoo go down to 1.699 // When we cant use the z – table we use the 't' Table :) ..... HYPOTHESIS TESTING: ---> Two ways to estimate your population // go

through the estimation // point and interval // point- one value and you can over or under shoot the mark or miss it totally .// interval – put it around the mean and at the measured level of confidence our

population falls in the following group, we might be wrong or right ... Hypothesis Testing --> To determine the level of precision of estimates, a confidence interval is placed around the sample // that our

sample statistics taht is driven by a particular confidence, is at this range somewhere, so on // This is one form of 'inference' ..... Can Also say. ==> A properly created sample is essential for the successful

application of inferential statistics // is this sample drawn from population – asking such questions // georgraphers often want to know if a sample is a representative of the population // Issue here may be to

show that the sample differs significantly from the population. Hypothesis Testing – Classic Approach ==> Process that is generally followed for hypothesis Testing // State 'Null' and 'Alternate'

hypotheses // Select appropriate statistical test // Select level of Significance // 95% level of confidence – 5% is my level of significance (EXAMPLE) // it is controling for you – in case you made a mistake

through the test // it is extermely important // Delineate regions of rejection and non-rejection of the Null Hypothesis // Calculate test statistic // each test we do has a particular formula associated with it //

Compare test statistic to theoritical value ..... Hypotheis Testing – Hypotheses == > Two complementary hypothesis of interest // Null Hypothesis 'Ho and // Alternate Hypothesis 'Ha'..... Generally, the

researcher wants to support the 'alternate' hypothesis and reject the 'null' // We either reject the null hypothesis or we fail to reject it --> by implication we accept the research as alternative hypothesis // The

'Null' hypothesis --> you need to show that it doenst make sense or work out // Essentially, you are testing the null hypothesis, so you assume it is true. ...... Hypothesis Testing – Hypotheses II --> Draw 1

of 2 conclusions // Reject Ho and conclude Ha is true // Fail to reject Ho and accept Ho is true // Selection of form of Ha depends on how the hypothesized difference is state // it is always Ho is // Ha can

have 3 different scenarioes, Ha is not equal , is greater or less than...... Hypothis are awlays stated in pairs and Hypothesis will we measure our hypothesis testing by we drive support by formally testing our

NO – that it must be true but we must test it it is always tested is No is equal to a value – somethign is not .. Ha; u =/ uH (not equal) or Ha: u>uH (greater) or Ha: u<uH (lesser)

=========================================================================================================================================

Hypothesis testing – ERROR (how significant or not the test is ) ==> Decision is either to reject or fail to reject the null hypothesis (deteremines how we test) // Based on a single measure, measurable

chance of making an incorrect decision. (we cold do everythign right and set up test and completely mess up our interpretation) // In hypothesis testing, 2 sources of error: // Type 1 and Type 2

Type 1 Error ==> Reject the null hypothesis as false, when in fact it is true // we rejecting Ho , when Ho in fact is true. // The sin of commison (you committing to the fact that your sample is different, and

ur population is differe, and ur landing for hypothesis landing is different, when it shouldnt be) // Called alpha (a) error or false positive // Observing a difference when none actually exists// Make sure to

Minimize the probability of Type 1 error occuring // In household size example, we would be concluding that there is a significant difference between household size in Toronto and that nationally, when no

significant difference actually exists. // One of the serious erros to withdraw. Type 2 error ==>Fail to reject the null hypothesis, when in fact it is false // The sin of ommision .. Called bete (B) error or false

negative // Failing to observing a difference when one actually exists // In household size example, we would be concluding that there is no significant difference between household size in Toronto and that

nationally, when a significant difference actually exists. ((((Type 1 the most critical of the two errors.))) .... Hypothesis Testing - Test Selection -- > Test used is a function of the research question, and

research assumptions // Test will vary according to the number of samples drawn, sampling design, scale of information (determins the test) // One of the most common is the One- Sample means Test or

One-Sample Difference of Means Test // Is there a significant difference between the sample mean adn the population mean ==> used with Interval/ Ratio scale // if we have Interval/ration scale, we can

calculate numerical variance, std deviatin, etc. / It is the most common .... Hypothesis Testing - Level of Significance // Placing a probability statement on the likelihood of sampling error // placing them

around particular values and finding out the likelyhood of it occuring // Select a fairly low significance level (0.5 or 0.1) in order to avoid type 1 error // set them really low and most of them are 0.5 //

Conclusion of the test is expressed in terms of the level of significance of the result // In classic hypothesis testing, rejecting the null hypothesis at 0.05 is the same as saying the statistical test is significant at

the 0.05 level. // There is only a 5% chance that we incorrectily made a statement about the null hypothesis. Or there is 5% chance that we are rejecting the null hypothesis........Hypothesis Testing-

Rejection Regions ==> Selecting significance level allows the regions of rejection and non-rejection of the null hypothesis to be created // Entire set of values that a test statistic could assume is divided into

two sets or regions // Rejection region – values supporting alternate hypothesis // Acceptance region – values supporting the null hypothesis...... Region or Regions of rejection cna be directional or non-

directional // if alternatie is directional – we will have one region rejection (it will be on positive or negative side) // If alternate hypothesis is - if mui is not equal to – we would have two region rejection

( one on negative and one on positive) .......REJECTION REGIONs - Non-directional and Directional - we no longer have Z(alpha) 2 – // It has to be significantly less than z – value // we have a

signifnance level of 0.05 and its directional // then the z – value is 1.65, since all the error is now on one side..... Hypothesis Testing – Calculate Test Statistic ==> Regardles of test used, a test statistic is

always created // Compare it to a critical value that we find with reference to our level of significance // In test comparing sample means, calculate a z or t statistic // for large samples (over 30) you use the z

and for small samples (under 30) you use t // z for n>30 // t for n <30 – CHECK FORMULAS - Xbar – u

Hypothesis Testing – P value Z = o

Decision reject or fail to reject the null hypothesis is made by comparing the test statistic to a critical value of z or t // we got our conclusion we move to the next test // But, different significant levels may

###### You're Reading a Preview

Unlock to view full version