Chapter 1: What is Statistics?
- Way to get info from data
- Descriptive statistics: deals with methods of organizing, summarizing and presenting data in convenient and
informative way (ie. graphing techniques)
- Use numerical techniques to summarize data; average
- Average is a measure of central location
- Range is measure of variability
- Inferential Statistics: methods used to draw conclusions/inferences about characteristics of population based on
sample data
- Exit polls: random sample of votes exit the polling booth and asked for whom they voted; sample proportion of voters
supported the candidates is computed
Key Statistical Concepts
- Population: group of all items of interest; very large; does not necessarily refer to group of people
- Parameter: descriptive measure of a population; mean # of soft drinks consumer by all students at the university or
proportion of 5 million who voted for Bush
- Sample: set of data drawn from the studied population
- Statistic: descriptive measure of a sample; used to make inferences about parameters
- Statistical Inference: process of making an estimate/prediction/decision about a population based on sample data;
measure of reliability
- Confidence level: proportion of times that an estimating procedure will be correct
- Significance level: measures how frequently conclusion will be wrong
- Hw pg 39, 47, 57-62, 69-72, 195, 197, 202, 204, 91-94, 100-104, 110-112, 118-125
Chapter 2: Graphical Descriptive Techniques 1
Types of Data and Information
- Variable: some characteristic of a population or sample (ie. prices of stocks varying daily)
- Values: possible observations of the variable; integers between 0 - 100 of statistics exam(100 marks)
- Data: observed values of a variable; midterm test marks of 10 students; datum refers to mark of one student
Interval data: real numbers; heights, weights, incomes, distances (quantitative or numerical)
Nominal data: values of nominal data are categories; responses to questions about marital status
(qualitative/categorical); only calculations based on frequencies or percentages of occurrence are
valid
Ordinal data: order of values has meaning; order of values of latter indicate higher rating; calculations
based on ordering process are valid
Interval/differences between values of interval data are consistent and meaningful
Calculation for Types of Data
Interval Data: all calculations permitted; set of interval data described by using the average
Nominal Data: calculations based on codes used to store this type of data are meaningless; compute percentages of
occurrences of each category
Ordinal Data: only permissible calculations should involve ranking process Describing Set of Nominal Data
- Frequency distribution: presenting the categories and their counts; relative frequency distribution lsits categories and
proportions with which each occurs
- Bar graph shows frequencies, pie chart shows relative frequencies
- Bars in bar graph arrange in ascending/descending ordinal values; pie chart wedges arrange clockwise in
ascending/descending order for ordinal data
Describing relationship between 2 nominal variables
- Univariate: techniques applied to single sets of data
- Bivariate: methods that depict relationship between variables
- Cross-classification table: describes relationship between 2 nominal variables; lists frequency of each combination of
values of the 2 variables
- If two variables are unrelated, patterns exhibited in bar charts should approx.. be the same
Comparing 2 or more nominal data sets
- Consider the three occupations (newspaper example) as defining 3 populations; if differences exist between columns
of frequency distributions (or between bar charts), then differences exist among the three populations
Chapter 3: Graphical Descriptive Techniques 2
Graphing techniques to describe set of interval data
- Create frequency distribution for interval data by counting number of observations that fall into each of a series of
intervals; called classes, covering range of observations
- Intervals should be equal; graphing and interpretation made easier
- Histogram: created by drawing rectangles whose bases are intervals and heights are frequencies
Determining number of class intervals
- # of class intervals = 1 + 3.3 log(n) (Sturges’s formula)
- Class interval width = (largest observation – smallest observation) / # of classes
Shapes of histograms
- Histogram is symmetric when we draw vertical lines down the centre and the two sides are identical in size and shape
- Skewness: histogram with a long tail extending to either right or left
- Modal class: class with the largest # of observations
- Unimodal histogram: one with a single peak
- Bimodal histogram: one with two peaks; no necessarily equal in
height; indicate 2 different distributions are present - Bell shape: special symmetric, unimodal histogram
- Stem-and-Leaf display: method that overcomes loss of information contained in actual experiment
Similar to histogram on its side; length of each line represents frequency in class interval defined by stems;
actual observations can be seen
- Relative frequency distribution: divide frequencies by # of observations
- Cumulative frequency distribution: shows the proportion of observations that lie below each of the class limits
- Ogive: graphical representation of cumulative relative frequencies
Describing time-series data
- Classify data according to whether observations are measured at same tie or whether they represent measurements at
successive points in time
- The former classed cross-sectional data (observations at same point in time); the latter called time series data
(observations taken over time)
- Line chart: plot of the variable over time; graphically depicts time series data
Describing relationship between 2 interval variables
- Scatter plot used to describe relationship between two interval variables
Patterns of Scatter Plots
- Linearity: linear relationship if most points fall close to the line
- Direction: positive and negative relationships (CORRELATION IS NOT CAUSATION)
Graphical Deception
- Graph without a scale
- Do not be influenced by graphs caption
- Change scale on y-axis to make It more dramatic (greater slope; numerically same); add a break; shrink the x-axis
- Show stability by stretching the x-axis; spreading points to increase distance thereby slope becomes less steep
- Bar graphs with width proportional to height exaggerates the data
Chapter 4: Numerical Descriptive Techniques
- Parameter is a descriptive measurement about a population
Ways to describe center of set of data
- Mean: average;
More
Less