20 Jan 2017

School

Department

Course

Professor

Descriptive Statistics

In descriptive statistics, plots and numerical summaries are used to describe a data set.

Plots for Categorical Variables

Acategorical variable is a variable that falls into one of two or more categories.

Examples: A university student’s major, the province in which a Canadian resides, a per-

son’s blood type.

We illustrate the distribution of a categorical variable by displaying the count or proportion

of observations in each category. We might use bar charts, Pareto diagrams (an ordered bar

chart) or pie charts.

Example. Gun calibre for buyback guns and guns used in homicides and suicides. (From

a buyback program in Milwaukee.)

Gun Calibre Buybacks Homicides Suicides

Small 719 75 40

Medium 182 202 72

Large 20 40 13

Other 20 52 0

Small Medium Large Other

Frequency

050 100 150 200

(a) Bar chart for the homicide guns.

Small

Medium

Large

Other

(b) Pie chart for the homicide guns.

Buybacks Homicides Suicides

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Relative Frequency

Other

Large

Medium

Small

(c) Side-by-side bar charts.

Buybacks Homicides Suicides

0.0

0.2

0.4

0.6

0.8

1.0

Relative Frequency

Other

Large

Medium

Small

(d) Stacked bar charts.

→Do excuses 'it

Chapter 3

Area is the

proportion

I

Count →

riron

Prop

°→

Frequency

n

Plots for Quantitative Variables

A quantitative variable is a numeric variable that represents a measurable quantity. Exam-

ples: the height of a student, the length of stay in hospital after a certain type of surgery,

the weight of a newborn African elephant.

For quantitative variables, numerical summaries such as averages have meaning.

To illustrate the distribution of a quantitative variable, we plot the diﬀerent values of the

variable and how often these values occur. This can be done in a variety of ways, including:

histograms, stem plots, boxplots, and dot plots.

To create a histogram, we ﬁrst create a frequency table. In a frequency table, a quan-

titative variable is divided into a number of classes (also known as bins), and the class

boundaries and frequency of each class is listed.

A histogram is a plot of the class frequencies, relative frequencies, or percent relative fre-

quencies against the class boundaries (or class midpoints).

Histogram of survival times (days) for 60 guinea pigs injected with tubercle bacilli:

Lifetime

Frequency

100 200 300 400 500 600

0510 15

Histograms and stemplots allow us to see the distribution of the data (what values the

variable takes on, and how often it takes on these values).

red

µWg'

gung -lstemdeaf displays)

fy

g

28-30

Something sto take note of "

-where is It centred ?

(we 'll use the means

median as measures of

Centre )

-How much variability is there ?

1we will use the Variance Estandard deviation as

--

measures of variability

-Are there any outliers ?(extreme values )

-what is the Shape ?(important )

Some common distribution shapes.

(a) A symmetric distribution.

Frequency

1000 2000 3000 4000

020 40 60 80

(b) A bimodal distribution.

(a) A distribution that is skewed to the right. (b) A distribution that is skewed to the left.

Figures illustrating right and left skewness.

Lifetime

Frequency

100 200 300 400 500 600

0510 15

(a) Lifetimes of 60 guinea pigs. Time-to-event data

is often right skewed.

1000 2000 3000 4000 5000

050 100 150

Weight(grams)

Frequency

(b) Birth weights of a random sample of 1000

Canadian male births in 2009.

Real world data sets illustrating right and left skewness.

-This is : -roughly symmetric

-Uni model (one peak )

-approx . normal (belt -shaped )

#

alternative

shapes

-not symmetric -AKA :Negatively

-right skew is

III.FEET :\ )skewed

-often seen in :time

to event data ,

Incomes ,housing

prices ,÷

ikjeytedtoiu -#IIETII ..