Study Guides (256,439)
AUS (5,366)
UniMelb (855)
Economics (39)
ECON10005 (1)
All (1)

ECON10005 - Quantitative Methods - Full Summary

17 Pages
46 Views

Department
Economics
Course Code
ECON10005
Professor
All

This preview shows pages 1-3. Sign up to view the full 17 pages of the document.
Lecture One:
Descriptive statistics is a process of concisely summarising the characteristics of sets of data.
Inferential statistics involves constructing estimates of these characteristics, and testing hypotheses
about the world, based on sets of data.
Modelling and analysis combines these to build models that represent relationships and trends in
reality in a systematic way.
Types of Data:
Numerical or quantitative data are real numbers with specific numerical values.
Nominal or qualitative data are non-numerical data sorted into categories on the basis of qualitative
attributes.
Ordinal or ranked data are nominal data that can be ranked.
- The population is the complete set of data that we seek to obtain information about
- The sample is a part of the population that is selected (or sampled) in some way using a
sampling frame
- A characteristic of a population is call a parameter
- A characteristic of a sample is called a statistic
- The difference between our estimate and the true (usually unknown) parameter is the
sampling error
- In a random sample, all population members have an equal chance of being sampled
In a population, a perfect strata would be a group with:
- individual observations that are similar to the other observations in that strata
- different characteristics from other strata in the population
- stratified sampling can improve accuracy
- may be more costly
In a population, a perfect cluster would be a group with:
- individual observations that are different from the other observations in that cluster
- similar characteristics to other clusters in the population
- can reduce costs
- may be less accurate
This is the cost/accuracy trade off
Lecture Two:
Ceteris paribus: the assumption of holding all other variables constant
Cross-sectional data are:
- collected from (across) a number of different entities (such as individuals, households, firms,
regions or countries) at a particular point in time
- usually a random sample (but not always)
- not able to be arranged in any “natural” order (we can sort or rank the data into any order
we choose)
- often (but not only) usefully presented with histograms
Time series data are:
- collected over time on one particular ‘entity’
- data with observations which are likely to depend on what has happened in the past
[ECON10005: QUANTITATIVE METHODS 1:
LECTURE REVISION NOTES]
- data with a natural ordering according to time
- often (but not only) presented as line charts
Lecture Three:
Measures of Centre:
Mean/Average: population: µ sample: x̄ = 
- easy to calculate
- sensitive to extreme observations
Median: middle number, or average of two middle numbers
- not sensitive to extreme observations
Mode: most frequently occurring number
- only used for finding most common outcome
x̄ µ = sampling error
If a distribution is uni-modal then we can show that it is:
- Symmetrical if mean = median = mode
- Right-skewed if mean > median > mode
- Left-skewed if mode > median > mean
Measures of Variation:
Population variance measures an average of the squared deviations between each observation and
the population mean: σ2 =

Population standard deviation is the square root of population variance: σ =

Sample variance measures the average of the squared deviations between each observation and the
sample mean: s2:

Sample standard deviation is the square root of sample variance: s =

Coefficient of variation measures the variation in a sample (given by its standard deviation) relative
to that sample’s mean, it is expressed as a percentage to provide a unit-free measurement, letting us
compare difference samples: CV = 
%
Lecture Four:
Measures of Association:
Covariance measures the co-variation between two sets of observations.
With a population size N having observations (xi, yi), (x2, y2), (xN, yN) etc. and having μx, μy, being the
respective means of the xi and yi terms, covariance is calculated as,



If we have a sample of size n, with sample means and , the covariance is calculated as



Problems with covariance: it is difficult to interpret the strength of a relationship because covariance
is sensitive to units.
Correlation gives us a measure of association which is not affected by units.
Sample correlation coefficient: 
, sx and sy are sample standard deviations
Population correlation coefficient: 
, σx and σy are population standard deviations
- r=1, perfect positive linear relationship
- r=-1, perfect negative linear relationship
- r=0, no linear relationship
Lecture Five:
A random experiment is a procedure that generates outcomes that are not known with certainty
until observed.
A random variable (RV) is a variable with a value that is determined by the outcome of an
experiment.
A discrete random variable has a countable number (K) of possible outcomes with each having a
specific probability associated with each.
Univariate data has one random variable.
Bivariate data has two random variables.
If X is a random variable with K possible outcomes, then an individual value of X is written as xi,
i=1,2,3…K
The probability of observing X is written as P(X=xi) or p(xi) where
- 
- 
- That is, all probabilities must lie between 0 and 1 and all added together equal 1 in total
Expected Value/Mean of a random variable: is the value of x one would expect to get on average
over a large/infinite number of repeated trials: µx = E(X) = 
Variance of a random variable: is the probability-weighted average of all squared deviations
between each possible outcome with the expected value: σ2 = V(X) =  or
Lecture Six:
Rules of Expected Values and Variances:
- E(a) = a V(a) = 0
- E(aX) = aE(X) V(aX) = a2V(X)
- E(a + x) = a + E(X) V(a + X) = V(X)
- E(a + bX) = a + b(X) V(a + bX) = b2V(X)

Loved by over 2.2 million students

Over 90% improved by at least one letter grade.

Leah — University of Toronto

OneClass has been such a huge help in my studies at UofT especially since I am a transfer student. OneClass is the study buddy I never had before and definitely gives me the extra push to get from a B to an A!

Leah — University of Toronto
Saarim — University of Michigan

Balancing social life With academics can be difficult, that is why I'm so glad that OneClass is out there where I can find the top notes for all of my classes. Now I can be the all-star student I want to be.

Saarim — University of Michigan
Jenna — University of Wisconsin

As a college student living on a college budget, I love how easy it is to earn gift cards just by submitting my notes.

Jenna — University of Wisconsin
Anne — University of California

OneClass has allowed me to catch up with my most difficult course! #lifesaver

Anne — University of California
Description
[ECON10005: QUANTITATIVE METHODS 1: LECTURE REVISION NOTES] Lecture One: Descriptive statistics is a process of concisely summarising the characteristics of sets of data. Inferential statistics involves constructing estimates of these characteristics, and testing hypotheses about the world, based on sets of data. Modelling and analysis combines these to build models that represent relationships and trends in reality in a systematic way. Types of Data: Numerical or quantitative data are real numbers with specific numerical values. Nominal or qualitative data are non-numerical data sorted into categories on the basis of qualitative attributes. Ordinal or ranked data are nominal data that can be ranked. - The population is the complete set of data that we seek to obtain information about - The sample is a part of the population that is selected (or sampled) in some way using a sampling frame - A characteristic of a population is call a parameter - A characteristic of a sample is called a statistic - The difference between our estimate and the true (usually unknown) parameter is the sampling error - In a random sample, all population members have an equal chance of being sampled In a population, a perfect strata would be a group with: - individual observations that are similar to the other observations in that strata - different characteristics from other strata in the population - stratified sampling can improve accuracy - may be more costly In a population, a perfect cluster would be a group with: - individual observations that are different from the other observations in that cluster - similar characteristics to other clusters in the population - can reduce costs - may be less accurate This is the cost/accuracy trade off Lecture Two: Ceteris paribus: the assumption of holding all other variables constant Cross-sectional data are: - collected from (across) a number of different entities (such as individuals, households, firms, regions or countries) at a particular point in time - usually a random sample (but not always) - not able to be arranged in any “natural” order (we can sort or rank the data into any order we choose) - often (but not only) usefully presented with histograms Time series data are: - collected over time on one particular ‘entity’ - data with observations which are likely to depend on what has happened in the past - data with a natural ordering according to time - often (but not only) presented as line charts Lecture Three: Measures of Centre: 𝚺𝒙 Mean/Average: population: µ sample: x ̄ 𝒏 - easy to calculate - sensitive to extreme observations Median: middle number, or average of two middle numbers - not sensitive to extreme observations Mode: most frequently occurring number - only used for finding most common outcome x̄– µ = sampling error If a distribution is uni-modal then we can show that it is: - Symmetrical if mean = median = mode - Right-skewed if mean > median > mode - Left-skewed if mode > median > mean Measures of Variation: Population variance measures an average of the squared deviations between each observation and the population mean: σ =2 1 ∑ (𝑥1− 𝜇) 2 𝑁 1 2 Population standard deviation is the square root of population variance: σ = √ 𝑁 ∑ (𝑥1− 𝜇) Sample variance measures the average of the squared deviations between each observation and the 2 1 2 sample mean: s : 𝑛−1 ∑ (𝑥1− x̄) 1 Sample standard deviation is the square root of sample variance: s = √ ∑(𝑥 1 x̄) 2 𝑛−1 Coefficient of variation measures the variation in a sample (given by its standard deviation) relative to that sample’s mean, it is expressed as a percentage to provide a unit-free measurement, letting us compare difference samples: CV =100 × % 𝑠 x̄ Lecture Four: Measures of Association: Covariance measures the co-variation between two sets of observations. With a population size N having observations (x, i),i(x 2 y2), (N ,Ny ) etc. and havingxμ y μ , being the respective means of the x and y terms, covariance is calculated as, i i 𝑁 1 𝐶𝑂𝑉 𝑋,𝑌 = ) ∑(𝑥 − 𝑖 )(𝑦𝑥− 𝜇 ) 𝑦 𝑁 𝑖−1 If we have a sample of size n, with sample means 𝑥̅ and 𝑦 ̅, the covariance is calculated as 𝑛 1 𝑐𝑜𝑣 𝑥,𝑦 =) ∑(𝑥 − 𝑥𝑖)(𝑦 − 𝑦 ̅) 𝑁 𝑖−1 Problems with covariance: it is difficult to interpret the strength of a relationship because covariance is sensitive to units. Correlation gives us a measure of association which is not affected by units. 𝑐𝑜𝑣(𝑥,𝑦) Sample correlation coefficient: 𝑟 = 𝑠𝑥 𝑦 , x and syare sample standard deviations 𝐶𝑂𝑉(𝑋,𝑌) Population correlation coefficient: 𝜌 = 𝜎𝑥 𝑦 , x and σ yre population standard deviations - r=1, perfect positive linear relationship - r=-1, perfect negative linear relationship - r=0, no linear relationship Lecture Five: A random experiment is a procedure that generates outcomes that are not known with certainty until observed. A random variable (RV) is a variable with a value that is determined by the outcome of an experiment. A discrete random variable has a countable number (K) of possible outcomes with each having a specific probability associated with each. Univariate data has one random variable. Bivariate data has two random variables. If X is a random variable with K possible outcomes, then an individual value of X is written as x, i i=1,2,3…K The probability of observing X is written as P(X=x) oi p(x) wiere - 0 ≤ 𝑝(𝑥 )𝑖≤ 1 - ∑ 𝑝( 𝑥 ) = 1 𝑖 - That is, all probabilities must lie between 0 and 1 and all added together equal 1 in total Expected Value/Mean of a random variable: is the value of x one would expect to get on average over a large/infinite number of repeated trials: µ = x(X) = ∑ 𝑥𝑖𝑝(𝑥 𝑖 Variance of a random variable: is the probability-weighted average of all squared deviations 2 2 between each possible outcome with the expected value: σ = V(X) = ∑( 𝑥𝑖− µ 𝑥) 𝑝(𝑥 𝑖 or ∑ 𝑥 𝑖 𝑥 −µ𝑖 𝑥 2 Lecture Six: Rules of Expected Values and Variances: - E(a) = a V(a) = 0 - E(aX) = aE(X) V(aX) = a V(X) - E(a + x) = a + E(X) V(a + X) = V(X2 - E(a + bX) = a + b(X) V(a + bX) = b V(X) Binomial Distribution: - Each experiment is independent - There are n trials each with two possible outcomes, success = p or failure = q Binomial random variable is the total number of successes in the n trials The number of successes is calculated by; P X = x = ) 𝑛! 𝑝 (1 − 𝑝) (𝑛−𝑥) 𝑥! 𝑛−𝑥 ! Binomial distributions are written as, X~b(n,p), n=number of trials, p=probability of success. Lecture Seven: Discrete random variable has a countable number of possible values Continuous random variable has an uncountable number of values within an interval of two points Normal Distribution: Normal distribution is bell-shaped and symmetrical. The total area under the curve is equal to 1. X~N(μ,σ ), X is normally distributed with a mean μ and variance σ 2 Standard Normal Distribution: Any X value can be standardised; 𝑍 = 𝑋−𝜇 𝜎 The standard normal distribution has a mean of 0 and a standard deviation and variance of 1. The z score shows the number of standard deviations the corresponding observation of x lies away from the population mean. Finding an X value for a specific z: 𝑋 = 𝜇 + 𝜎𝑍 Lecture Eight: Good Friday Lecture Nine: If we take repeated samples of size n from a population X and record 𝑥̅ for each, the collection of 𝑥̅ can be represented as a random variable 𝑋 with its own distribution. This is called a sampling distribution. The distribution of 𝑋 is different from the distribution of the population X. The mean of the sampling distribution is 𝜇 𝑋̅ The standard deviation of the sampling distribution i𝑋 𝜎 , known as the standard error. - Sampling mean is equal to population mean: 𝜇𝑋= 𝜇 - The standard error is less than standard deviation𝑋 𝜎 < 𝜎 2 𝜎 2 - 𝑉 𝑋 = 𝜎 =𝑋 𝑛 𝜎 - √𝑉 𝑋 = 𝜎 = 𝑋 √𝑛 Central Limit Theorem: If repeated samples are taken from X (n>30) the sampling distribution 𝑋 will be approximately normal, the larger n is, the more accurate this approximation is. If X is normally distributed, 𝑋 will always be normally distributed. Standardising 𝑋: ̅ ̅ 𝑍 = 𝑋 − 𝜇𝑋̅= 𝑋 − 𝜇 𝜎𝑋 𝜎/√𝑛 Lecture Ten: X is a binomial random variable where p is the probability of success and q = 1-p In general; population proportion: - 𝜇 = 𝑛𝑝 - 𝜎 = 𝑛𝑝𝑞 Population proportion refers to the number of times a specific outcome X occurs within a 𝑋 population: 𝑝 = 𝑁 Samples proportion refers to the number of times a specific outcome X occurs within a sample: ̂ = 𝑋 𝑛 In general; sampling proportion: 𝑋 - 𝜇𝑝= 𝐸 ( ) = 𝑝 𝑋 𝑝𝑞 - 𝜎𝑝= 𝑉( ) = 𝑛 𝑛 𝑋 We can approximate the distribution of ̂ = 𝑛by a normal distribution with a mean p and variance pq/n: 𝑝̂ ≈ 𝑁(𝑝,𝑝𝑞) with a standard error√ 𝑝𝑞 if ̂ and n̂ are both ≥ 5 𝑛 𝑛 Lecture Eleven: Sample statistics are known functions of sample data. Before a sample is drawn from a population, a sample statistic is a random variable. Once the sample is drawn, the statistic becomes a constant and is no longer random. Random variables have probability distributions. The distribution of a statistic is called a sampling distribution. Exact sampling distributions will depend upon the distribution of the populations from which the sample is drawn i.e. a normal population will produce a normal sample distribution, however if the sample size is >30, we can use the CLT to approximate normality. An estimator is a sample statistic which is constructed to have specific properties Eg. to estimate the centre of a distribution: n 2 ̅ Estimator: ̂ = min ∑ i=1(Xi− c) ,iê = X Eg. to minimise the sum of absolute deviations: 𝑑 Estimator: 𝑐̂ = 𝑚𝑖𝑛∑| 𝑋𝑖− 𝑐 ,𝑖𝑒 𝑐̂ = 𝑚 (the median) Principles of Estimation: ̂ Unbiasedness: E ( ) θ the estimator is right on average, in repeated samples ̅ 𝜎2 Consistency: Probability of estimator being wrong goes to zero as sample size gets big; 𝑉 𝑋 =𝑛 → 0 𝑎𝑠 𝑛 →∝ We can construct point estimators to makes a specific guess of the parameter value or interval estimators to guess a range of value in which the parameter may lie. Samples statistics such as mean, median, variance etc. are all examples of point estimates of population parameters. Confidence Intervals: Rather than finding the probability content of a given interval, confidence intervals find the interval on the basis of sample data with a given probability content. These intervals can then be used to guess the location of population parameters. If 𝑋~𝑁 (,𝜎 𝑡ℎ̅) 𝑓𝑜𝑟 𝑔𝑖𝑣𝑒𝑛 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠 𝐿 𝑎𝑛𝑑 𝑈,Pr 𝐿 ≤ 𝑋 ≤ 𝑈 = Pr(𝑧 ≤ 𝑍 ≤ 𝑧 ) 𝐿 𝑈 𝑋 Additionally; Pr 𝑧𝐿≤ 𝑍 ≤ 𝑧 𝑈) = Pr(𝑋 − 𝑧 𝜎𝑈 𝑋̅ ≤ 𝑋 − 𝑧 𝜎 = 𝑝𝐿 𝑋̅re 𝑧 ,𝑧 ,𝜇 ar𝐿 co𝑈stant ̅ ̅ This gives us the confidence interval: [𝑋 − 𝑈 𝑋 ̅𝑋 − 𝑧 𝐿 𝑋 ̅ This means in repeated samples of size n, the probability of a randomly chosen interval covering the true populations mean, μ is p. ̅ The interval is random because 𝑋 varies from sample to sample, therefore the interval is random but μ is not. The probability, p, is the confidence level of the interval, 𝑝 = 1 − 𝛼 In constructing a confidence interval for μ we would set 𝑧𝐿= −𝑧 𝛼/2 and 𝑧𝑈= 𝑧 𝛼/2 Interpretation of the confidence interval is that we are x% confident the interval covers the mean. 𝜎 The confidence interval estimator is 𝑋 ± 𝑧 ( ) 2 √𝑛 - As the level of confidence increases, the z score becomes more extreme and the interval widens - As the population standard deviation increases, the standard error increases and the interval widens - As the sample size increases, the standard error decreases so the interval narrows Lecture Twelve: If we do not know μ or σ we should replace σ with s and build our confidence intervals using t- 𝑋−𝜇 values; 𝑡 = 𝑠 using n-1 degrees of freedom (√𝑛) If the table does not give the degrees of freedom you want, approximate, and write a note of why, explaining the approximation used. 𝑠 Confidence interval estimator: 𝑋 ± 𝑡 𝛼,𝑑𝑓( ) this can only be used if 𝑋~𝑁 2 √𝑛 - A the level of confidence increases, the t score becomes more extreme so the interval widens - As the sample standard deviation increases, the standard error increases so the interval widens - As the sample size increases, the interval narrows Comparison of z and t values: Interval estimators in general: Let 𝜃be the estimator of a parameter 𝜃 with standard error of 𝜃 being 𝑠 ̂ 𝜃 A (1-α) 100% confident interval for 𝜃 is [𝜃 − 𝑐 ̂ ,𝜃 + 𝑐 𝑎𝑠̂] where 𝑐 𝑎 cuts off an upper tail 2 𝜃 1−2 𝜃 1−2 probability of α/2. 𝑝 𝑞̂ To find a confidence interval estimation of the proportion we use; 𝑝̂ ± 𝑧 ( ) 2 𝑛 - As the level of confidence increases, the z scores become more extreme, widening the interval - As 𝑝̂ and 𝑞̂ approach 0.5, the standard error increases so the interval widens - As the sample size increases, the standard error decreases, so the interval narrows Margin of error is half the width of an interval estimate, which is equal to 𝑧 ( ) 2 √𝑛 Any specified maximum allowable margin of error is called the error bound, B. 𝜎 𝜎 2 𝐵 = 𝑧 ( ) → 𝐵 = 𝑧 2 2 √ 𝑛 𝑎/2 𝑛 2 𝑧𝑎𝜎 To find the sample size needed for a specified error bound we calculate; 𝑛 = 𝑧2 𝜎 = ( 2 ) 2 𝑎/2 𝐵2 𝐵 𝑝𝑞 For proportions, error bound; 𝐵 = 𝑧 ( 𝑎 √ ) however to find n, we set 𝑝̂ = 0.5 which gives a 2 𝑛 𝑧𝑎 √𝑝𝑞 conservative, wide interval estimate and then solve the equation 𝑛 = ( 2 )2 𝐵 Lecture Thirteen: - Estimators are statistics that are random variables before the samples is drawn - We use estimators to guess unknown parameter values - A point estimator gives a single guess of a parameter - A confidence interval gives a range of parameter values that are consistent with observed sample data The null hypothesis is usually an assertion about a specific value of the parameter and always has the ‘=’ sign. The null is assumed true unless the evidence in the data supports the notion that it is not true. The alternative hypothesis is the maintained hypothesis, where true lies if the null is not true. To reject the null hypothesis, statistically significant evidence must be found. A Type I error occurs when we reject H whe0 H is tru0. A Type II error occurs when we do not reject H whe0 H is fal0e. Values of Z that are evidence in support of H are in the acceptance region. All other values of Z are 0 in the rejection region. Level of significance is the probability of the test statistic falling into the rejection region given H0is true, it is α. The values of the test statistic that lie on the boundary between the acceptance and
More Less
Unlock Document

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

Unlock Document
You're Reading a Preview

Unlock to view full version

Unlock Document

Log In


OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit