# SOAN 3120 Chapter Notes - Chapter 1: Exploratory Data Analysis, Time Series

48 views3 pages

13 Sep 2016

School

Department

Course

Professor

SOAN3120

TM

Chapter 1 – Picturing Distributions with Graphs

1.1 – Individuals and Variables

- Data includes information about some group of individuals (usually people, but can also be animals or

things)

- The information is organized in variables

- Any set of data is accompanied by background information to help us understand it:

oWho? What and how many individuals do the data describe?

oWhat? How many variables, what are their exact definitions, and what unit of measurement is

each variable recorded?

oWhere?

oWhen?

oWhy? What purpose do the data have?

- Two types of variables:

oCategorical Variables: Places and individual into one of several groups or categories

Ex. Sex

oQuantitative Variables: Take numerical values for which arithmetic operations such as adding

and averaging make sense. Usually recorded with a unit of measurement

Ex. Weight (lbs)

- Most data tables are laid out in a spreadsheet, where each row is an individual, and each column is a

variable

1.2 – Categorical Variables: Pie Charts and Bar Graphs

-Exploratory Data Analysis: Using statistical tools and ideas to examine data in order to describe their

main features

- Two main principles to organizing the exploration of data:

1. Begin by examining each variable by itself. Then move on to study the relationships among the

variables

2. Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data

Describing a Single Variable

- The proper choice of graph depends on the nature of the variable

- The distribution of a variable tells us what values it takes and how often it takes these values

oThe values are labels for the categories, therefore the distribution of a categorical variable lists

the categories and gives either the count of the percent of individuals who fall into each category

-Pie charts and bar graphs display the distribution of a categorical variable more vividly

-Pie Charts: Show the distribution of a categorical variable as “pie” whose slices are sized by the counts or

percents for the categories

oMust include all the categories that make up a whole

oUse only when you want to emphasize a category’s relation to the whole

-Bar Graphs: Represent each category as a bar, the heights showing the category counts or percents

oEasier to make and read than pie charts

oOften better to arrange the bars in order of height, allowing one to immediately see which

category appears most often

oMore flexible, can use to compare quantities that are not part of a whole

Ex. A question with possible multiple answers (What social media do you use?)

-Pie charts and bar graphs are mainly tools for presenting data: they help the audience grasp data quickly

oLimited use for data analysis because it is easy to understand data on a single categorical variable

without a graph

1.3 – Quantitative Variables: Histograms

- Quantitative variables often take many values

find more resources at oneclass.com

find more resources at oneclass.com