# Econ 2B03 Ch1, Ch2.docx

Unlock Document

McMaster University

Economics

ECON 2B03

Bridget O' Shaughnessy

Fall

Description

Econ 2B03
Chapter 1&2: Presenting Data: Summary Statistics
Sep 8, 2011
Course Website: http://socserv.mcmaster.ca/racinej/2B03/
Statistics
- Is a branch of mathematics dealing with the collection, presentation, and
interpretation of data.
- The field can be divided into two broad categories
Descriptive statistics, which seeks to describe general characteristics
of a set of data.
Inferentail statistics, which seeks to draw inferences about (unknown)
features of a population based on a (known) sample drawn from
population.
- Broadly speaking, statistics provides a rigorous framework for describing,
analyzing and drawing inference on the basis of data.
Q: Why do we need statistics?
A: Mainly due to uncertainty ( we cannot know with certainty many things in
life, even though they clearly exist)
e.g. The actual avg height of all people in the world at a time use ‘sample’
information based on limited information.
- In any statistical study, we analyze information/data obtained from
individuals.
- Individuals can be any object of interest. (e.g. people, plant)
- A variable is any characteristic of an individual.
- Variable varies among individuals. (as opposed to a constant which does
not) (e.g. age, income)
- The distribution of a variable tells us what values the variable takes and
how often it takes these values.
- Population: Set of all possible observations on some characteristic. (e.g.
age, height, income) - Sample: Subset of a population.
- Random sample: One obtained if every member of the population has an
equal chance of being in the sample.
2 Type of population
- Categorical population: A population whose characteristic is inherently
nonumerical. (e.g. sex, race)
- Quantitative population: A population whose characteristic is numerical.
- For categorical populations, we are typically interested in proportions. (e.g.
the proportion who are male)
- For quantitative population, we are typically interested in e.g. avg, and how
‘spread-out’ the variable is
Types of Categorical Data
- Nominal Data
Numbers merely label differences in kind
Cannot be manipulated arithmetically
Can only be counted (e.g. 0=male, 1=female)
- Ordinal Data
Label differences in kind
Order or rank observations based on importance
No meaningful arithmetic, analysis possible (e.g. ranking firms
according to profit)
R: www.r-project.org
R-studio: www.rstudio.org
Sep 12, 2011
Presenting Data : Tables & Graphs
- First split data into groups/classes
- Split depends on data type
- Data types Categorical (nominal or ordinal)
Quantitative (discrete or continuous)
- Data Classes should be:
Collectively exhaustive (must exhaust all logical possibilities for
classifying available data)
Mutually exclusive (must not overlap or have data in common)
- Desirable Class number:
Should fit data type
Often recommended: between 5&20
Sturgess’s Rule: desirable number of class=k, an integer, where k is
the integer closest to (use the rules for rounding)
1 + 3.3 l10 n
Where n is the sample size10log n is the power to which the base (10) is
raised to yield n.
- Desirable class widths:
Class widths = the difference between lower & upper limits of a class
To achieve uniform class widths in a table, divide data set width by
desirable class number.
- Approx class width:
Largest value – smallest value
Desirable class number
Tabular Components
- First consider creating effective tabular summaries
- An effective table includes:
Number (often based on chapter or page no.)
Title (focus on what, where, when)
Caption (brief verbal summary)
Footnotes (e.g. size of sampling error)
Decimals (consistent rounding rules)
Class sums (sum of data pertaining to each class) Frequency Distributions
- An effective way to summarize data
- Absolute Frequency Distributions
Absolute class frequency:
Absolute no. of observations that fall into given class
Absolute frequency distribution:
Tabular summary of a data set
Shows absolute no. of observations fall into each of several data
classes
- Relative Frequency Distribuetions
Relative class frequency:
Ratio of a particular class’s no. of observations to the total no. of
observations made
Relative frequency distribution
Tabular summary of a data set
Shows proportions of all observations that fall into each of several
data classes
Absolute / total no. Of observations
- Cumulative Frequency Distributions
Cumulative class frequency
The sum of all class frequencies up to & including the class in
question
- LE distribution (typical, called ‘cumulative’ distribution)
LE: Less than or equal to upper class limit
Moves from lesser to greater class
- ME distribution (less used, often called ‘survivor’ distribution)
ME: more than or equal to lower class limit
Moves from greater to lesser class Sep 13, 2011
Cross Tabulations
- Are tabular summaries for two variables
- 1 represented by row heads
- 1 represented by column heads
- Information for both variables entered in table cells
- In R we use xtabs( ) Function
e.g. Suppose that 7 out of 10 males are admitted to engineering school while
4 of 10 females are admitted
Admission
+(Yes) -(No)
Sex +(Male) 7 3
-(Female) 4 6
Graph Types
- Core Graphs
Two-dimensional graphs
E.g. histograms scatter diagram etc.
- Specialty Graphs
Combine elements from core graphs to display data in unique ways
E.g. bar graphs, pie charts etc.
- 3D graphs
Display with height, width and depth
- Frequency Histograms
Horizontal axis
Identifies upper and lower limits of data class
Vertical axis
Shows number of observations in each class (absolute frequencies)
Shows ratio of class frequency to class width (relative frequencies)
Rectangles Represent corresponding class frequencies by area or by height (if
all class intervals are alike)
In R we use the hist( ) Function
- Frequency Polygons
Portray frequency distributions as many sided figures
Class mark: avg of two class limits
Final points on horizontal axis lie 1/2 length below the lowest (or above
the highest) Class limit
- Density Estimator
Portray frequency distributions as smooth curves
Remove irregularities from histograms depicting info gathered in
sample surveys
Estimate how histograms would appear if census info were graphed
with many tiny classes
In R we use density( )
Graphing Two Variables
- Scatter plot diagrams graph the relationship between 2 quantitative
variables
- Each dot equals one observation measured on the horizontal axis and
another observation measured on the vertical axis
- R: plot( )
Ways to chart categorical data
- Bar graph: each category is represented by a bar, R: barplot( )
- Pie charts: The slices, R: pie( )
Sep 15, 2011
Summary Statistics: Symbolic Expression
- Population Parameters (typically unknown)
Summary statistics based on population data
Designated by Greek letters (e.g. μ, σ)
- Sample Statistics (computed from sample)
Summary statistics based on sample data
Designated by Roman Letters (e.g. X, s) - Observed values
Traditionally represented by x or X
Different values indicated by subscripts 1, 2, 3 etc (e1g.2X ,3X , X ....)
- Observation Totals
Population total = N
Sample total = n
Summation Notation
- In statistics, we often need to ‘accumulate’ or ‘sum’ some

More
Less
Related notes for ECON 2B03