SOAN 3120

FINAL EXAM

STUDY GUIDE

Wednesday, September 14

Displaying Distributions

Lying with Statistics

- It is possible to lie with statistics

oWe expect them to come from experts who make them for specific purposes

oPeople that don’t have the knowledge may not understand data properly

oGiven the raw data and ability to examine them graphically, careful analysis makes deception

more difficult

If you know statistics it makes it less likely to be duped

- Always be critical when evaluating a study

oAlways potential for falsification of data, omissions, and selective presentation of data

oThere is an importance of replication

Without it, we cannot verify earlier results from earlier studies

- Having the ability to plot data by way of visual representation can serve to forewarn certain problems in

that data

Basic Definitions

- Data sets made up of cases

oCan refer to people, groups of people, countries, corporations, etc.

- Cases made up of variables

oSomething that can take on different values for different individuals

oCan vary by certain characteristics (age, gender, SES)

-Categorical variables differentiated by name only

oEx. Gender, occupation, age groups

-Quantitative variables take on numerical values of a given unit of measure

oEx. Age (years), weight (lbs), height (cm)

- The distribution of a variable tells the range and frequency of values it includes

oHighest value and lowest value within the distribution, and how often it takes on those values

Looking at Data

- Usually organized as tables in which rows represent units of observation and columns represent variables

oRows ex. Individual, countries

oColumns ex. Age, gender, income

Two Types of Categorical Data

-Nominal variables have no intrinsic order

oEx. Regions

-Ordinal variables have a natural order

oEx. Survey attitude statements (Strongly Agree, Somewhat Disagree)

Types of Quantitative Data

-Counts are non-negative integers (whole numbers)

oEx. Population

-Amounts are also non-negative, but need not be integers (also called ratio variables, since meaningful

ratios of two values be formed)

oEx. Ratios that can be compared across cases (8x…5x…)

-Relative frequencies (proportions, percents, and rates) have both minimum and maximum values

oEx. Infant Mortality Rate: 1000 x # of kids that died in the first year of birth / # of live births

-Interval scales, since zero point is arbitrary (e.g., zero Celsius), can compare ratios of difference, or

intervals, but cannot form meaningful ratios

oCannot say 20 degrees is 2x hotter than 10 degrees Celsius because the zero point is arbitrary

(does not necessarily correspond with zero heat)

- Methods of analysis often depend on the nature of the variables

Categorical Data

- Little more than simple counts and percentages over the categories

- Bar-graphs or pie-charts are rarely worth the effort, and the latter presumes exhaustive categories

oMeaning you need to know what the whole of the pie is in order to talk about the pieces of that

pie, all the data needs to be there

Quantitative Data

- Since few individuals have exactly the same value of a quantitative variable, bar graphs divide its range

into equal class intervals

- A histogram shows the count or percentage falling into each interval (horizontal axis gives the range of

the variable, vertical axis gives the count or percent

oStem and leaf plot is a quick way to make a histogram if you have only a little bit of data

Far left represents the stems, the right side represents the leaves

