# SOAN 3120 Chapter Notes - Chapter 1: Exploratory Data Analysis, Time Series

48 views3 pages
13 Sep 2016
School
Course
SOAN3120
TM
Chapter 1 – Picturing Distributions with Graphs
1.1 – Individuals and Variables
- Data includes information about some group of individuals (usually people, but can also be animals or
things)
- The information is organized in variables
- Any set of data is accompanied by background information to help us understand it:
oWho? What and how many individuals do the data describe?
oWhat? How many variables, what are their exact definitions, and what unit of measurement is
each variable recorded?
oWhere?
oWhen?
oWhy? What purpose do the data have?
- Two types of variables:
oCategorical Variables: Places and individual into one of several groups or categories
Ex. Sex
oQuantitative Variables: Take numerical values for which arithmetic operations such as adding
and averaging make sense. Usually recorded with a unit of measurement
Ex. Weight (lbs)
- Most data tables are laid out in a spreadsheet, where each row is an individual, and each column is a
variable
1.2 – Categorical Variables: Pie Charts and Bar Graphs
-Exploratory Data Analysis: Using statistical tools and ideas to examine data in order to describe their
main features
- Two main principles to organizing the exploration of data:
1. Begin by examining each variable by itself. Then move on to study the relationships among the
variables
2. Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data
Describing a Single Variable
- The proper choice of graph depends on the nature of the variable
- The distribution of a variable tells us what values it takes and how often it takes these values
oThe values are labels for the categories, therefore the distribution of a categorical variable lists
the categories and gives either the count of the percent of individuals who fall into each category
-Pie charts and bar graphs display the distribution of a categorical variable more vividly
-Pie Charts: Show the distribution of a categorical variable as “pie” whose slices are sized by the counts or
percents for the categories
oMust include all the categories that make up a whole
oUse only when you want to emphasize a category’s relation to the whole
-Bar Graphs: Represent each category as a bar, the heights showing the category counts or percents
oEasier to make and read than pie charts
oOften better to arrange the bars in order of height, allowing one to immediately see which
category appears most often
oMore flexible, can use to compare quantities that are not part of a whole
Ex. A question with possible multiple answers (What social media do you use?)
-Pie charts and bar graphs are mainly tools for presenting data: they help the audience grasp data quickly
oLimited use for data analysis because it is easy to understand data on a single categorical variable
without a graph
1.3 – Quantitative Variables: Histograms
- Quantitative variables often take many values
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows page 1 of the document.
Unlock all 3 pages and 3 million more documents.