false

Class Notes
(835,872)

Canada
(509,458)

University of Waterloo
(18,569)

Economics
(994)

ECON 221
(9)

Lecture

Unlock Document

Economics

ECON 221

Pirapagaran Tharmalingam

Summer

Description

ECON221 – Course Notes
Lecture 1
Learning Objectives
o Define Statistics
o Describe the uses of statistics
o Distinguish Descriptive & Inferential Statistics
o Define Population, Sample, Parameter, and Statistic
o Define Quantitative and Qualitative Data
o Define Random Sample
What is statistics?
o Define: A methodology for collecting, classifying, summarizing,
organizing, presenting, analyzing and interpreting numerical
information
Uses of Statistics
o Application Areas:
Economics
Forecasting, Demographics
Engineering
Construction & Materials
Sports
Individual and Team Performance
Business
Consumer preferences, financial trends
o Uses of statistics in Econ
It’s the methodology that we use to confront theories
The theory of demand and and other testable
propositions with the facts
It is the set of procedures and intellectual processes by which
we decide whether or not to accept a theory as true
It is the process by which we decide what and what not to
believe
Use of stats lead us to:
A wider belief in the ‘truth’ of a particular theory OR
To its rejection as inconsistent with the facts
Descriptive & Inferential Statistics
o Descriptive Statistics
Description and presentation of data
Utilizes numerical and graphical methods
To find patterns in the data
To summarize the information it reveals
To present that information in a meaningful way
Descriptive statistics LACK measure of reliability o Inferential Statistics
Using the data to make inference (estimates, decisions,
predictions) about features of the environment from which the
data were selected or about the underlying mechanism that
generated the data
Inferential statistics have a measure of reliability
E.g. an election says a poll is accurate to within 3 percentage
points
Key Terms
o Population
All items of interest
o Sample
Portion of population
o Parameter
Summary measure about population
o Statistic
Summary measure about a sample
Components of Statistical Procedures
o Population
Set of all objects or individuals of interest
E.g. all UW students, all econ majors, etc
o Variable
A characteristic or property of the population unit that is of
interest
E.g. height: from pop’n of uW students, GPA
o Sample
A subset of the units of a population
E.g. 4 year students, 50 econ 401 students
o Statistical inference
Estimation, prediction, or other generalization about a
population based on information in a sample
Examples:
Possible inferences from a sample data:
o The average gpa of all econ majors is 87%
o 35% of all econ majors get less than 75% GPA
o the median GPA for econ majors is 85%
Check text ex. 1.1, 1.2
Data Sets
o Three kinds:
Cross sectional
Time-series
Panel Data
o Within data sets there are two kinds of data
Quantitative
Qualitative Types of Data
o Quantitative Data
Measured on a naturally occurring scale
Equal intervals along scale (allows for meaningful
mathematical calculations)
Ratio data
Data with absolute zero (zero means no value)
o E.g. bank balance, grade
Interval data
Data with relative zero (zero has value) e.g.
temperature
o Qualitative Data
Measured by classification only
Non-numerical in nature
Ordinal data
Meaningfully ordered categories
o E.g. best to worst ranking, age categories
Nominal data
Categories without a meaningful order
o E.g. political affiliation, industry classification,
ethnic/cultural groups
Random Sample
o Every sample size n has an equal chance of selection
Collecting Data
o Data sources
Published source – books, journals, abstracts
Primary vs. Secondary
Designed Experiment
Often used for gathering information about an
intervention
Survey
Data gathered through questions from a sample of
people
Observational Study
Data gathered through observation, no interaction with
units
Lecture 2
Learning Objectives
o Describe Qualitative Data
o Describe Quantitative Data
o Explain Numerical Data Properties
o Describe Summary Measures o Analyze Numerical Data Using Summary Measures
Methods for Describing Sets of Data
o Describing Data using Graphs
o Describing Data using Charts
Data Presentation
o Qualitative Data
Summary Table
Bar graph, Pie Chart, Pareto Diagram
o Quantitative Data
Stem & Leaf Display
Frequency Distribution Histogram
How can Qualitative data be Described?
o Since the qualitative data are non-numeric in nature they are best
described using classes (also known as bins)
o Class frequency is the number of data points which fall into a class
o Class relative frequency is the number of data points in the class
divided by total number of data points. Class percentage is class
relative frequency multiplied by a 100.
o How can the Results be Summarized in a Frequency Table
Frequency/Percent/Cumulative Percent
o How are data displayed on a bar graph?
Bar graphs are more suitable when the purpose is the
comparison of categories
o How are data displayed using a pareto diagram?
o How are data displayed using a pie chart?
Pie charts are more suitable when the main objective is to
investigate the portion of the whole that is in a particular
category
How can Quantitative data be described?
o After looking at data:
o How does dot plot work?
Data points line up on top of each other depending on the
frequencies
o What is a histogram?
A histogram is a graph in which classes are marked on the
horizontal axis and the class frequencies on the vertical axis.
The class frequencies are represented by the heights of the
bars and the bars are placed adjacent to each other
Information given by a histogram
Has the following strengths
o Provides visual indication of which class is more
frequent (modal information)
o Supplies an indication of the degree of spread, or
variation, of the data
o Displays the shape of the distribution Has the following weaknesses:
o Can obscure time differences between data sets
o Can be manipulated to present the data in a
fashion different from reality
How is a histogram constructed?
Identify the largest and smallest value in the data set
(max and min)
Divide the interval between the max and the min into
sub intervals (classes, bins)
Keep in mind that each data point must fall into one and
only one class and no data point must be on the
boundary
How many classes should the data be split into? There
are two rules which can help answer that question (let n
denote the sample size)
o (2n) 1/3~ (2n) 0.3333
o Sturges’ Rule: 1 + log(n) / log(2)
To determine the width of the subintervals divide the
range of data (max-min) by the number of classes
Determining the number of classes
Course text PAGE 58
o # of observations in a data set v. number of
classes
less than 25 = 5-6 classes
25-50 = 7-14 classes
more than 50 = 15-20 classes
Newbold, P.Carlson W.L and Thorne B
o Sample size v. Number of classes
Less than 50 = 5-6 classes
50-100 = 6-8 classes
More than 100 = 8-10 classes Numerical Data Properties and Measures
o Central Tendency
Mean, Median, Mode
o Variation
Range, Interquartile Range, Variance, Standard Deviation
o Relative Standing
Percentiles, Z-Scores
Notation that will be useful for this course:
o Summation Notation: (Greek Letter sigma)
Lets say we have ‘n’ observations, denoted as X1, X2 … Xn. Each
observation denoted with a subscript
The Expression ∑ ni=1x is equal to “x1+ x 2 x 3 x +4x +5+ x ” n
and reads “sum of x ior i equals 1 to n”
o What is central tendency?
Central tendency is the tendency of data center about certain
numerical values
A measure of central tendency is a single value that
summarizes a set of data. It locates the center of the values
As shown previously, the most common measures are MEAN
(arithmetic average), median (positional center), and mode
(most frequent value)
Calculating Mean
The arithmetic mean (which is usually referred to as,
simply the mean) is the sum of data values divided by
the number of observations
Population mean would be denoted as μ (Greek letter
‘mu’). This is a parameter.
Sample mean would be denoted as x (read as ‘x-bar’).
This is a statistic.
Calculating mean for ungrouped items:
Sample mean x
n
x
x1x 2x 3x n i1 i
x
n n
Population mean (value typically not known)
N = population size
N
xi
i1
N
Calculating the Mean for grouped data o Sometimes data you encounter is divided into
classes and all you can observe are the class
frequencies (or relative frequencies) and means of
observation in the classes
o In this case the mean of the data set would be the
weighted mean, with weights being the
frequency of the respective classes
o Here there are k classes, fi denotes the class
frequency, and Xi-bars indicated the class means
o This formula also applies in cases when Xi-bars
are unobservable but the frequency of each class
is still known
o In this case class means would be subsistuted for
mid-points
Main disadvantage of the mean?
o The mean is easily influenced by the extreme
values (for example, outliers, values which fall
afar from the main cluster of data)
o Lets demonstrate it with an example suppose we
have the following 2,2,2,3,3,3,6 and then replace
one fo the 3’s with an outlier 17
The Role of the Mean in “Balancing” the data …
o A histogram ‘balances’ when supported by the
mean (in this case 140.6)
What is Median?
The Median is the middle observation of a set of
observations that are arranged in increasing or
decreasing order
IF the number of observations in a sample is odd then
the median is the middle observations. How does one locate the median?
o The median would be located in the position
number 0.5(n+1) if the sample has an
oddnumber of observations
o The first number used in calculating the median
What is the mode?ated at 0.5n and the second one at 0.5n+1
The mode, if one exists, is the most frequently occurring
value
The data can be: uni-modal, bi-modal, and multi-modal
Data displayed in a histogram can have a modal class
(the clas with highest frequency)
o Which of these measures are unique
Mean is a unique measure, the data set has only one mean
Median is a unique measure even the data set with even
number of observations has one median (even though it is
calculated with the two observations)
Mode may not exist and if it existst may not necessarily be
unique
o How much information is taken into account by each of these
measures?
Mean uses all of the information
Median uses less information
Mode uses the least
What is skewness?
o The data is symmetric if the distribution has the same shape on either
side of the center
o Otherwise the data is Skwed. The distribution extends more to one
side than to the other
o This is caused by extreme values “dragging” the mean to their side
o How can skewness be measured?
o Skewness can be measured by the AVERAGE CUBED DEVIATION from
the sample mean (for example): if the large deviations are
skewed) NOT ON final (Below formula)positive and data will positively
n
3
(x ix)
3 i1
m
n1 Why does one need numeric measures of variability
o The mean (and other central tendencies) does not provide sufficient
or complete description of data
o Variability indicates how spread the data is over all possible values
o Measures of variability are numbers which describe how spread the
o Most commonly used measures of variability include: range, variance,
and standard deviation
How is Range Calculated
o Range is the difference between the largest and the smallest
observation
o The larger value of this measure of variability indicates a greater
o Range is vulnerable to extreme values
o Range, also, loses snesitivy when the number of observations is large
o What other types of Range Measures exist
Quartiles divide a set of observations into four equal parts
The interquartile Range measures the spread of the middle
50% of the data. It is calculated as follows: IQR = Q3 – Q1
Semi-interquartile range calculated as: (Q3-Q1)/2
Whao The sample variance is the sume of squared deviations from the mean
dvided by (n-1).
o We are using squared deviations, such as the sum of deviations from
the mean is equal to zero
n
2
(x x)
i
2
s i1
n1
o
o mean divided by N which denotes the population sizetions from the o How is Variance from a Frequency Distribution Calculated?
Can be approximated using thi class means (x)
And the sample mean (x)
n
2
fi(x i x)
2
s i1
n1
Note That
n
n f
i
i1
What is Standard Deviation
o Smeanful measure of data variability.uare root of variance. It is a
o Sample Standard deviation (denoted s) would be calculated
according to the following formula: n
2
(x ix)
s i1 s 2
n1
o
Whao Population standard deviation will be denoted as “sigma”
o In most cases the population standard deviation is not observable
since the population variance is unobservable
Some properties of Standard deviation
o Standard deviation is always greater or equal to zero. This value
would equal zero when all the observations are the same.
o Larger values of standard deviation indicate greater spread of data
o Helps us determine the likely size of chance of error in measurement
How can we interpret standard deviation?
o number of standard deviations. obserations fits within a certain
o These are two rules which describe that amount …
o 1) Empirical Rule
Sometimes is referred to as the Normal rule.
For a symmetrical, bell shaped frequency distribution
approximately 68 percent of the observations will lie
within plus and minus one standard deviation of the
mean; about 95 percent of the observations will lie
within plus and minus two standard deviations of the
mean; and practically all (99.7 percent) will lie within
o 2) plus and minus three standard deviations of the mean
Chebyshev unknown distributions (0,3/4,8/9)
Empirical known, almost all inferential statistics is based on empirical rule
(68/95, 99.7)
Box Plot
(Q3 + (IQR *1.5) Q3 Q1 Q1 – (1.5*IQR) Lecture Three
Relative Standing Measures
o Descriptive measures of a relationship of a measurement to the rest of
its data
o Common Measures
Percentile ranking/score
Percentile rankings make use of the pth percentile
For any p, the pth percentile
o Has p% of the measures lying below it and
o (100-p)% above it
The median is the 50 percentile
o 50% observations above and below
Percentile given a score
o Percentile of score x = # of scores less than
x/total number of scores * 100
Finding the score given a percentile
o X = p/100 * n
Z-score
X subtracted by mean divided by standard deviation
Z scores follow empirical rule for mounded
distributions
Anything above 2 or -2 = unusual values and outside -3
or 3 are outliers
Outliers
o Outlier an observation that is unusually large/small relative to data
being described
o Can have dramatic affect on mean, standard deviation, and scale of
histogram
o Causes: invalid measurement, misclassified measurement, a rare
(chance) event
o 2 detection methods
Box Plots
Lower Quartile, Middle Quartile, and Upper Quartile
necessary, IQR
Reveals the: center, spread, distribution, presence of
outliers
Excellent for comparing two or more data sets
Z-Scores Graphing Bivariate Relationships
o Relationship between two quantitative variables can tell if
positive/negative/no relationship
o Time Series Plot
Data produced over time (time on horizontal)
Points connected by straight lines
Distorting Truth
o Errors in presenting data
Using ‘chart junk’
No relative basis in comparing data batches
Compressing the vertical axis
No zero point on the vertical axis
o
Lecture Four
Learning Objectives
Define Experiment, Outcome, Sample Point, Sample Space, Event & Probability
o Events, Sample Spaces and Probability
Experiment
Process of observation that leads to a single outcome
with no predictive certainty (tossing 2 coins)
Sample point/simple event
Most basic outcome of an experiment or event that
cannot be broken own into simpler components
Sample Space
A listing of all sample points for an experiment
Sample Point probability
Relative frequency of the occurrence of the sample
point e.g. HT/out of all sample points
o Sample Space Properties
Mutually Exclusive
2 outcomes can not occur at the same time
o Male and Female in same person
Collectively Exhaustive
One outcome in sample space must occur
o Male or Female
o Sample Space Examples
Observe Gender male, female
Play a football game Win, Lose, Tie
Select 1 card, note color (Red black)
o Events
Any collection of sample points
Simple Event
Outcome with one characteristic Compound Event
Collection of outcomes or simple events
Two or more characteristics
Joint Event is a special case
o Two events occurring simultaneously
Use of Venn Diagram, Two-Way Table, or Tree Diagram to find Probabilities
o Compound Event (atleast one tail) inside the venn diagram
o Outcome HH outside
o S indicating sample space
o
Describe and Use Probability Rules
o What is Probability
Numerical measure of the likelihood that event will occur
P(Event)
P(A)
Prob(A)
Lies between 0 & 1
Sum of sample points is 1
o Probability
P(Event) = X/T
X = number of event outcomes
T = Total number of sample points in Sample Space
Each of T sample points is equally likely – P (sample
point) = 1/T
Approaches to Probability
Relative Frequency Approximation
o Conduct (or observe) an experiment a large
number of times and count the number of times
event A actually occurs, then an estimate of P(A)
is
P(A) = # of times A occurred/# of times
trial was repeated
Note approximation of the actual
probability
The Classical Approach
o Requires equally likely outcomes
o If a procedure has n different simple events, each
with an equal chance of occurring, and s is the
number of ways event A can occur, then
o P(A) = s/n = # of ways A can occur/# of different
simple events
o This is actual probability
The Subjective Probabilities Approach o P(A), the probability of A is found by simply
guessing or estimating its value based on
knowledge of the relevant circumstances
o So the individual assigns probabilities based on
personal experience, anecdotal evidence, etc.
o E.g. probability of conservatives winning
elleciton is .7
Lecture Five
Compound Events
o Union
Outcomes in either events A or B or Both
‘OR’ statement
U symbol (i.e., A U B)
o Intersection
Outcomes in both events A and B
‘AND’ statement
intersection symbol
o Compound Event Probability
Numerical measure of likelihood that compound event will
occur
Can often use a two-way table
Two variables only
Formula Methods
Additive Rule
Conditional Probability Rule
Multiplicative Rule
o Complementary
The event that A does not occur
All events not in A: Ac
P(A) + P(Ac) = 1
Mutually Exclusive Events
o Events do not occur simultaneously
o A intersect B does not contain any sample points
Drawing spades + hearts
Additive Rule
o Used to get compound probabilities for union vents
o P (A or B) = P (A U B) = P (A) + P(B) – P(AintersectB)
o For Mutually Exclusive Events: P (A OR B) = P (A) + P (B)
Conditional Probability
o Event Probability GIVEN that another event occurred
o Revise sample space to account for new information
Eliminated certain outcomes
o P(A|B) = P (A intersect B)/ P(B)
Statistical Independence
o Event occurrence does not affect probability of another event Toss one coun twice
o Causality not implied
o Tests for independence
P (A|B) = P(A)
P (A intersect B) = P (A) * P(B)
Multiplicative Rule
o Used to get compound probabilities for intersection of events (joint
events)
o P (A and B) = P (A intersect b) = P(A) * P(A) = p(B)*P(A|B)
o For Independent Events:
P (A and B) = P (A intersect B) = P(A) * P(B)
Bayes’s Rule
o Allows computation of an unknown conditional probability, P(B|A) by
converting it to a known conditional probability P(A|B)
o For k mutually exclusive events
o P(B1|A) =
Counting
Factorial:
o The factorial is a non-negative integer where n greater than or equal
to zero, is the product of all integers 1,2, …n
o Factorial Notation:
n! = 1x2x..xn, for n > 0 and n! = 1 for n=0
o Recursive Formula:
n! = (n-1)! x n
o Approximation formula:
Stirling’s formula: n! ~ square root of 2pien (n/e)^n)
Where pie = 3.1415927 and e = 2.718218
Methods for Counting Outcomes
o Use:
We need to be able to count the number of simple events in a
compound event and possible events in the sample space
o Basic Rules
Addition Rule
Suppose event A can happen in p ways and event B in q
ways
Then either event a OR event B but not both can
happen in p+q ways
Multiplication Rule
Suppose event A can happen in p ways and an unrelated
event B in q ways
Then both event A AND B can happen in p x q ways
W.o Replacement (1/5->1/4->1/3, etc.)
o Ordering Matters (PERMUTATIONS) Suppose that n distinct objects are to be ‘drawn’ sequentially
or ordered from left to right in a row
The number of ways to arrange n distinct objects in a row is n!
Explanation: we can fi the first position in n ways, the next is n-
1 ways, etc.
The number of ways to select r objects from n distinct
outcomes is n(n-1)(n-2)…(n-r+1)
By the rth pick (r-1) objects have already been used
Described as n taken to r terms
Therefore:
N(r) = n! / (n-r)!
o Order does not matter (COMBINATIONS)
Suppose that n distinct objects are to be ‘drawn’ without
replacement
The number of ways to choose r objects from n is denoted by
(n over r) called “n choose r”
Combinations is also used when the number of sample points
is too large to enumerate
Formula:
N over r = n(r) /r! = n!/r!(n-r)! or nCr
Proof
The nmber of ways to choose r objects from n and
arrange them from left to right is n
Any choice of r objects can be arranged in r! ways
Therefore:
o # of was to choose r objects from n = n (r)
o # of ways to choose r objects from n = n(r)/r!
With replacement
o Never considered in this course
Statistical Independence = lack of correlation
Mutually exclusive = no sharing of elements
Lecture 6
Learning Objectives
Distinguish between the Two Types of Random Variables
o A random variable is a numerical-valued function defined on the
outcomes of an experiment
o A variable (typically represented by x) that has a single numerical
value, determined by chance, for each outcome of a procedure
o A variable that assumes numerical values associated with random
outcomes of an experiment
o Two Types Discrete Random Variable
Has either a finite number of values or a countable
number of distinct possible values
o Where ‘countable’ refers to the fact that there
might be infinitely many values, but they result
from a counting process
o Test: for any given value of the random variable
you can designate the next largest or next
smallest value of the random variable
o Poisson random variable is exception
o Ex. #Sales, #Correct,
Continuous Random Variable
Random variable that has an infinite number of distinct
possible values
o Values can be associated with measurements on
a continuous scale with no gaps or interruptions
o So the variable can take on all possible values in
an interval of numbers
o Test: given a particular value of the random
variable, you cannot designate the next largest or
next smalles vaue
o Ex. Weight, Hours
Discrete random variables “count”
Continuous random variables “measure” (Length, width,
height, etc)
Another distinction: quantitative order
Discrete random variable
o Sample points can be enumerated or listed in
order
Continuous
o Not possible to list sample points in order
Describe Discrete Probability Distributions
o Discrete Probability Distribtion
List of all possible [x, p(x)] pairs
X = value of random variable (outcome)
P(x) = probability associated with value
Mutually exclusive (no overlap)
Collectively exhaustive (nothing left out)
0 < p(x) < 1 for all x
Sum of p(x) = 1
o Summary Measures
Expected Value (mean of probability distribution)
Weighted average of all possible values
Mean = E(X) = Sumofxp(x)
Variance
Standard Deviation
Square root of variance
o Interpretation
E(x) is NOT the value of the random variable x that you
“expect” to observe if you perform experiment once but rather
a “long run” average
o Variance of Discrete Random Variables
Sum of (x-mean)^2p(x)
Describe the Binomial and Poisson Distributions
o Binomial Distribution
Number of ‘successes’ in a sample ofn observations (trials)
Number of reds in 15 spins of a roulette wheel
Number of correct ona 33 question exam
Binomial Distribution Properties
Two different sampling methods
o Infinite population without replacement
o Finite population with replacement
Sequence of n identical trials
Each trial has 2 outcomes
o ‘success’ desired outcome or ‘failure
Constant trial probability
Trials are independent
Binomial Probability Distribution Function
x n x n! x n x
p(x) p q p (1 p)
x x!(n x)!
p(x) = probability of x successes, n = sample size, p =
probability of success, x= # of successes in a sample
o Poisson Distribution Discrete probability distribution that applies to occurrence of
some event over a specific interval (integer value from zero to
infinity)
The value gives the number of occurrences of the circumstance
of interest during period
Events PER UNIT
Time , length, area, space
Possion Process
Constant event probability average of 60/hour is
1/min for 60 1 minute intervals
One event per interval (don’t arrive together)
Independent events
o Arrival of 1 person does not affect another’s
arrival
Function
p(x) = probability of x given expected (mean) # of ‘successes’
P(x) = Ex. Mean^# of success/unit e^-ex.mean/ #ofsuccess!
Ex value/Mean = variance
Standard deviation
Describe the Uniform and Normal Distributions
o Uniform Distribution
The Uniform Distribution
Characteristics
o Uniform probability distributions result:
When the probability of all occurrences in
the same space are the same OR
When a continuous random variable is
evenly distributed over a particular
interval
o Are these probability distributions discrete or
continuous?
They may be either discrete or
continuous
Discrete Uniform Distribution
o Consider a random number generator that
cranks out random numbers between 0 and 9
o By construction of the computer program, the
probability that any one of the 10 numbers will
turn 1/10 or .1
o Therefore probability distribution X = .1 every
time
o The discrete probability function is:
P(x) = 1/s where P(x) = P(X=x); and x = a;
a+1; a+2,,,, a+s(-1) A denotes smallest outcome and s
denotes the number of distinct outcomes
Eyeball the mean (median)
Mean: E(x) = a + (s-1)/2
Variance: = s^2 -1 / 12
o Probability Distributions for Continuous Random Variables
Continuous Probability Density Function
Mathetmatical Formula
Shows all values x, and frequencies f(x)
o F(x) is NOT probability
Properties f(x)dx = 1 (area under cruve)
F(x) > 0, a a) denotes the probability that z score is greater
than a
P (z

More
Less
Related notes for ECON 221

Join OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.