Published on 12 Mar 2012

School

Department

Course

Professor

Lecture 7 March 5, 2012

Analyzing Quantitative Data

Descriptive Statistics

Provides visual for data

Most common ones look at frequency distributions

o See raw numbers or percentages

Also can look at charts and graphs

Variables

Discrete – fixed set of values or value attributes

Continuous – infinite number of values, usually on a continuum

Levels of measurement:

o Nominal

o Ordinal

o Interval

o Ratio

Choosing Measures of Central Tendency

Use the mode when…

o Variables are nominal, ordinal, interval, or ratio

o You want a quick and easy measure

o You want to report the most common score

Ex: 5 8 9 2 8 3 7 4 7 0 3 8 3 1 5

Mode: 3 & 8

Bimodal distribution - more than one mode (have 2)

Use the median when…

o Variables are ordinal, interval, or ratio

o Variables at the interval-ratio level have highly skewed distributions

o You want to report the central score

Ex: 3 8 14 19 27 28 46

Median: 19

Ex: 15 19 21 30 36 45 48 58

Median: 33

Use the mean when…

o Variables are interval-ratio

o You want to report a typical score

o You anticipate additional statistical analyses

Measure of Dispersion/Variation

Range

o $25 000, $32 000, $48 000, $55 000

o Can be used with ordinal and nominal

Percentiles

o Can be used with ordinal and nominal

Standard deviations (s or SD)

o City A: s = $1 782

o City B: s = $4 920

o City C: s = $19 467

o Can be used with interval or ratio

The Empirical Rule

67% ± 1s

95% ± 2s

99.7% ± 3s

Lecture 7 March 5, 2012

Units of Standard Deviation

Z Scores

o Always have the same values for their mean and standard deviation

o Allow you to compare two or more distributions or groups

o Describe the individual score relative to the group

o Quantitifies score

o Allows for comparison between two groups

One- versus Two-Tailed Tests?

How Would I Use a Z Score?

Suppose, for example, that you took the same class as your friend, but you had different

instructors. Your final grade was 76% and your friend got 82%. Intuitively, it might seem that

your friend did better than you. But what if the class he/she took was easier than yours?

Your class: Friend’s class:

Mean = 54%, s = 20% Mean = 72%, s = 15%

(76-54) / 20 = 1.1 (82-72) / 15 = 0.67

Inferential Statistics

Used to:

o Generalize from the sample to the population

o Test hypotheses

o Test whether descriptive results are random or true

o Sampling becomes really important!

If sample is not representative, difficult to generalize population because

data is biased

Confidence Intervals

Add a level of assurance to your tests

Provides a range for scores to fall into

We usually leave a 5% chance of error

Comes up during political polling

Lecture 7 March 5, 2012

Confidence Interval Example

A study of the leisure activities of Americans was conducted on a sample of 1000 households.

The respondents identified TV as the major form of leisure activity.

If the sample reported on average of 6.2 hours of TV per day, with a standard deviation of 0.7,

what is the estimate of the population mean?

The information from the same is…

Mean = 6.2 Z = ± 1.96

SD = 0.7 C.I. = 6.2 0.04

N = 1000

Alpha is set at 0.05

Based on this we could estimate that the population watches an average of 6.2 ±

0.04 hours of TV per day. Thus, the interval would be 6.16 – 6.24 hours per day.

E = Z score x s / √N

E = 1.96 x 0.7 / (√1000)

E = 1.96 x 0.022

E = 0.04

Type I and Type II Errors

Does the patient have AIDS?

Reject Null

Fail to Reject Null

Null is true

Type I

(Alpha) error

Test shows patient does

have AIDS but patient is told

they do not

Correct decision

Null is false

Correct decision

Type II

(Beta) error

Test shows patients doesn’t

have AIDS but patient is told

they do have it

*Objective is to try and prove that the null hypothesis is false

Minimizing Type I and II Errors

It’s ultimately about balance

o Proper methodological procedures

o Good sampling techniques

For example, to avoid making a Type II error…

o Increase the Alpha level, thereby increasing the chance of making a Type I error

o Increase the sample size

Hypothesis Testing

AKA Significance Testing

Goal: to decide (with a known probability of error) if a sample has certain characteristics in

your study

“Statistically significant”

Results are not likely due to chance

## Document Summary

Most common ones look at frequency distributions: see raw numbers or percentages. Also can look at charts and graphs. Discrete fixed set of values or value attributes. Continuous infinite number of values, usually on a continuum. Use the mode when : variables are nominal, ordinal, interval, or ratio, you want a quick and easy measure, you want to report the most common score. Ex: 5 8 9 2 8 3 7 4 7 0 3 8 3 1 5. Bimodal distribution - more than one mode (have 2) Use the median when : variables are ordinal, interval, or ratio, variables at the interval-ratio level have highly skewed distributions, you want to report the central score. Ex: 3 8 14 19 27 28 46. Ex: 15 19 21 30 36 45 48 58. Use the mean when : variables are interval-ratio, you want to report a typical score, you anticipate additional statistical analyses.