STAB22H3 Chapter Notes - Chapter 1-6: The Leaves, Bar Chart, Contingency Table

74 views12 pages
8 Feb 2016
Statistics is a way of reasoning, along with a collection of tools and methods, designed to help us understand the world
particular calculations made from data (values along with their context)
Data vary so statistics is about variation. Data vary b/c we can’t see everything let alone measure it all, and even what we
do see and measure, we measure imperfectly
Statistics makes sense of the world by allowing us to understand and model the variation so that we can see the underlying
Best way to understand statistics is to see it at work posing questions about the world
Collecting data on their customers, transactions and sales lets companies track their inventory and help them predict what
their customers prefer
This data can be used to help companies predict what their customers will likely buy in the future so they can determine
how much of each item to stock…also information given in data can be used to improve customer service
Data are useless without their context. The context can be set by answering the 5W’s & H questions. The two most
important questions are Who and What, if you can’t answer “Who” and: What” you don’t have data and you don’t have any
useful information
Data in table 2.1 on page 8 has no context which is why we can’t understand what the figures mean. We can make the
meaning clear if we organize the value s into a data table (table 2.2 p9). Rows in data table answer the “who” question
The rows of a data table correspond to individual cases about whom/which we record some characteristics
Cases go by different names:
Respondents: individuals who answer a survey
Subjects/participants: people on whom we experiment on
Experimental units: animals, plants, web sites and other inanimate subjects
Records/cases: the rows in a data tables
We often refer to data values as observations w/o being clear about the who
o Unless you know the “who” of the data you won’t be able to understand the data what the data means
To be able to generalize from the sample of cases selected from some larger population that we’d like to
understand, we want the sample to be representative of that population
What and Why
Variables: the characteristics recorded about each individual usually shown as the columns of a data table and they have
a name that identifies what has been measured
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 12 pages and 3 million more documents.

Already have an account? Log in
Variables play different roles and you can’t tell a variable’s role just by looking at it some variables just tell us what
group or category each individual belongs to
Some variables have units which tell how each value has been measured the units can tell us how much of something we
have, or how far apart two values are. Without units the values of a measured variable have no meaning
Two types of variables that we need to understand:
Categorical variable/qualitative variables: a variable that names categories and answers questions about how
cases fall into those categories
o If the values of variables are words rather than numbers, its highly likely that they are categorical variables
Quantitative variables: measured variables with units that answer questions about the quantity of what is
Some variables can be both categorical and quantitative
o Ex. could ask your age in years (seems quantitative), it would be quantitative if they wanted to
know the average age of those customers who visit their site after 3am, but if they want to figure out what
CD to offer you in a special deal then thinking of your age as belonging to one of the categories of child,
teen, adult or senior would be more useful
You must always look to the “Why” of your study to decide whether to treat a variable as categorical or quantitative
Variables that report order without natural units are called ordinal variables
The how of data refers to the methods used to collect the data
Counts Count
Counting is a natural way to summarize the categorical variable shipping method (refers to ex of amazon’s special offer or
free shipping) the word “counts” doesn’t necessarily mean categorical
We use counts to measure the amounts of things ex. How many songs are in you iPod
We use counts in two different ways: when we count the cases in each category of a categorical variable, the category
labels are the “what and the individuals counted are the “who” of our data
Identifying Identifiers
Identifier variables themselves don’t tell us anything useful about the categories because we know there is exactly one
individual in each but they are crucial in this age of large data sets
They make it possible to combine data from different sources, to protect confidentiality and to provide unique labels
they are CATEGORIAL variables with just one individual in each category ex. UPS tracking number, SIN #, Student Number
Important to recognize when variable plays role of identifier so you don’t analyze it
Just because a variable has one case per category doesn’t limit it to being an identifier
Recall: in a data table, the rows represent cases and the columns represent variables
The problem with data tables is that you can’t see what’s going on can’t really identify patterns, relationships, trends or
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 12 pages and 3 million more documents.

Already have an account? Log in
The Three Rules of Data Analysis
1. Make a picture that will display your data in a way that will reveal things you aren’t likely to see in a table of
2. Make a picture that will show the important features and patterns in your data shows you things you did not
expect to see (unexpected patterns or extraordinary data values)
3. Make a picture that best communicates your data to others
Frequency Tables: Making Piles
To make a picture of data the first step is to make piles -> pile together things that seem to go together to see how the
cases distribute across different categories
For categorical data you just count the number of cases corresponding to each category and pile them up
Frequency table records the totals and the category names
Relative frequency table: displays the proportions or percentages, rather than the counts of the values in each category
Both types of frequency tables describe the distribution of a categorical variable b/c they name the possible categories and
tell how frequently each occurs
The Area Principle
Experience and psychological tests show that our eyes tend to be more impressed by the area than by other aspects of each
image (refer to figure 3.2 on p. 22)
Area principle: a fundamental principle of graphing data that states that the area occupied by a part of the graph should
correspond to the magnitude of the value it represents basically bigger value corresponds to bigger area, smaller value
corresponds to smaller area
Bar Charts
Obeys the area principle gives an accurate visual impression of the distribution of values
Height of each bar shows the count for its category heights determine their areas and the areas are proportional to the
Bar chart: displays the distribution of a categorical variable, showing the counts for each category next to each other for
easy comparison they have small spaces b/w each bar to show that the freestanding bars can be rearranged in any
You can also use a relative frequency bar chart to draw attention to the relative proportion of the variable being measured
(ex. Number of passengers aboard the titanic falling into each class category). Simply replace the counts with percentages
and graph that.
Pie Charts
Pie charts: show the whole group of cases as a circle slice the circle into pieces whose size is proportional to the fraction
of the whole in each category
Pie charts give a quick impression of how a whole group is portioned off into smaller groups, if you are comparing many
categories, it is more beneficial (easier) to display and communicate the data in a bar chart
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 12 pages and 3 million more documents.

Already have an account? Log in

Get access

$10 USD/m
Billed $120 USD annually
Homework Help
Class Notes
Textbook Notes
40 Verified Answers
Study Guides
1 Booster Class
$8 USD/m
Billed $96 USD annually
Homework Help
Class Notes
Textbook Notes
30 Verified Answers
Study Guides
1 Booster Class