Introductory Statistics for Economists
ECON 2500 – Winter 2011 – Xianghong Li
Chapter 1 – Looking at Data – Distributions – Jan 4
- Context of a data set, purpose of it.
- Observations: individuals or firms.
- Variable: any characteristic of an individual.
- Quantitative variable: takes on numerical values. Unit of measurement, e.g.
hourly wage in dollars.
- Categorical variable: places an individual into one of several groups or
categories. E.g. gender.
- Not all observations (observed values of the variables) are the same – variation.
- The pattern of variation of a variable is called its distribution. Distribution is a
summary of the values a variable takes and how often it takes the different
- Using R to take frequency.
1.1 Displays Distribution with Graphs
How to Draw a Stemplot
- Sort observations and rank them from the smallest to the largest.
- Write down the stems according to the range of the data.
- Add leaves.
How to Draw a Histogram
- Sort the observation.
- Decide (equal width) intervals.
- Make a summary table (counts and percent).
o Choose counts for small data sets.
o Choose percent for large data sets.
- Graph histogram (no gap between columns).
- Interval choice: your judgement, choose the one show, the overall pattern best.
- Overall pattern: shape, spread, center.
- Deviations from the overall pattern, especially outliers.
- Mode: major peak(s) of a distribution.
- Symmetric: mirror images on each side of the midpoint (think of a Stemplot).
- Skewed to the right: the right tail is much longer.
- Skewed to the left: the left tail is much longer.
- Bar and pie charts: categorical variables.
o Pie: require all the categories that make up a whole. o Bar: more flexible.
- Stemplot: suitable for small data sets.
- Histogram: suitable for large data sets.
1.2 Describing Distributions with Number
- Math rep: summation operators.
- Center of distribution: mean and median.
- Spread of distribution: quantiles and standard deviation.
- It is where the histogram balances.
- It is not resistant to outliers or skewness of a distribution.
o A simple example.
o Example 5: The mean highway mileage for the two-seater without Honda
- R command.
Measuring Center: Median
- The middle of a distribution, how to get it:
o Sort all observations.
o If the number of observations is odd, the median is the center observation.
o If the number of observations is even, the median is the mean of the two
Mean Versus Median
- For a symmetric distribution, the median and mean are the same.
- Median is less sensitive to outliers and skewness of a distribution, while mean is
very sensitive to both.
- Mean often turns out to be a more meaningful measure, e.g. portfolio returns.
- Suggestion: reporting both mean and median.
The Quartiles Q and Q
- To calculate the quartiles:
o Arrange the observations in increasing order and locate the median M in
the ordered list of observations.
o The first quartile Q is1the median of the observations whose position in
the ordered list is to the left of the location of the overall median.
o The third quartile Q is 3he median of the observations whose position in