Textbook Notes
(362,882)

Canada
(158,081)

University of Guelph
(11,992)

SOAN 3120
(37)

Michelle Dumas
(18)

Chapter 1

# Textbook Chapter 1.docx

Unlock Document

University of Guelph

Sociology and Anthropology

SOAN 3120

Michelle Dumas

Fall

Description

Chapter 1: Picturing Distributions with Graphs
Statistics is the science of data
Individuals and Variables
Any set of data contains information about some group of individuals are the
objects described by a set of data, they may be people but they can be
animals or things
The information is organized in variables is any characteristics of an
individual, a variable can take different values for different individuals
When planning a statistical study ask yourself the questions:
1. Who? What individuals do the data describe? How many individuals appear
in the data?
2. What? How many variables do the data contain? What are the exact
definitions of these variables? In what unit of measurement is each variable
recorded?
3. Where? Student GPAs ad SAT scored will vary from college to college
depending on many variables, including admissions “selectivity” for the
college
4. When? Students change from year to year, as do prices, salaries etc.
5. Why? What purpose do the data have? Do we hope to answer some specific
questions? Do we want answers from just these individuals or for some
larger group that these individuals are supposed to represent? Are the
individuals and variables suitable for the intended purpose?
Categorical variable places an individual into one of several groups of
categories
Quantitative variable takes numerical values for which arithmetic operations
such as adding and averaging make sense, the values of a quantitative
variable are usually recorded in a unit of measurement such as seconds or
kilograms
Most data tables follow this format – each row is an individual, and each
column is a variable
Spreadsheets are commonly used to enter and transmit data and do simple
calculations
Categorical Variables: Pie Charts and Bar Graphs
Statistical tools and ideas help us examine data in order to describe their
main features, this is called exploratory data analysis
There are two principles which help us organize exploration of a set data
1. Begin by examining each variable by itself, then move on to study the
relationships among the variables
2. Begin with a graph or graphs, then add numerical summaries of specific
aspects of the data
The proper choice of graph depends on the nature of the variable To examine a single variable we usually want to display its distribution
o Distribution of a variable tells us what values it takes and how often it
takes these values
o The values of a categorical variable are labels for the categories. The
distribution of a categorical variable lists the categories and gives
either the count or the percent of individuals who fall in each category
It’s a good idea to check data for consistency
o The percents should add to 100% or in fact 99.9% because of
rounding, this is called round off error
o Round off error don’t point to mistakes in our work just to the effect of
rounding off results
Pie Charts
o Show the distribution of a categorical variable as a “pie” whose slices
are sized by the counts of percents for the categories
o A pie chart must include all the categories that make up a whole
o Use a pie chart only when you want to emphasize each category’s
relation to the whole
Bar Graphs
o Represent each category as a bar
o The bar heights show the category counts or the percents
o Bar graphs are easier to make then pie charts and also easier to read
o It is often best to arrange the bars in order of height that way we can
see immediately which majors appear most often
o Bar graphs are more flexible than pie charts, both graphs can display
distribution of a categorical variable, but a bar graph can also
compare any set of quantities that are measured in the same units
Bar graphs and pie charts are mainly tools for presenting data: they help
your audience grasp data quickly
Since it is easy to understand data on a single categorical variable without a
graph, bar graphs and pie charts are of limited use for data analysis
Quantitative Variables: Histograms
Quantitative variables often take on many variables
The distribution tells us what values the variable takes on and how often it
takes these values
A graph of the distribution is clearer if nearby values are often grouped
together
The most common graph of the distribution of one quantitative variable is a
histogram
Although histograms resemble bar graphs, their details and uses are different
o A histogram displays the distribution of a quantitative variable
The horizontal axis of a histogram is marked in the units of measurement for
the variable
o A bar graphs compares the sizes of different quantities
o The horizontal axis of a bar graph need not have any measurement

More
Less
Related notes for SOAN 3120