# CHAPTER 1 NOTES

Examining a Distributions

-in any graph of data, look at the overall pattern and for dramatic deviations from that

pattern

- describe pattern by its shape, center and spread

-important kind of deviation is an outlier, an individual value that falls outside the overall

pattern

-describe center of distribution by its midpoint, the value with roughly half the observations

taking smaller values and half taking larger values. We can describe the spread of

distribution by giving the smallest and largest values

-describe the spread of distribution by stating the smallest and largest values (Q1, Q3)

Stemplots and histograms display this. Stemplots on its side with the larger value lies to

the right.

-Describing shape:

•Does the distribution have one or several major peaks called modes? Unimodal- one

peak

•Is it symmetric or skewed? Symmetric- values smaller and larger than its midpoint

are mirrored. Ex. heights of young women. Skewed- tails. Ex. money amounts,

skewed to the right.

-outliers: look for points that are clearly apart from the body of data, not just the most

extreme observations in a distribution. Sometimes they point to errors made in recording

the data.

-time plots (pg. 19): of a variable plots each observation against the time at which it was

measured. Always put the time on x axis (horizontal) scaled of your plot and the variable

you are measuring on y axis. Connecting the points will show change over time. data

collected over time, plot observations in time order. Displays of stemplots of histograms

ignore time order, so it can be misleading when there is systematic change over time.

-time series: measurements of a variable taken at regular intervals over time. Government,

economic, and social data are often published as this. Ex. monthly unemployment rate and

the quarterly gross domestic product. Time plots reveal the main features of a time series.

-in a time series:

•Trend: is a persistent, long term rise or fall

•Seasonal variation: a pattern that repeats itself at known regular intervals of a time

-many economic time series show strong seasonal variation. Government agencies adjust

this variation before releasing economic data, it’s called seasonally adjusted (helps avoid

misinterpretation.

-residuals: removing trends and seasonal variation and what remains after the patterns are

removed

-exploratory data analysis: uses graphs and numerical summaries to describe the variables

in a data set and the relations among them

-distribution of a variable- what values and how often it takes these values

1.2- Describing Distributions with Numbers

-numerical summaries make comparisons more specific

www.notesolution.com

-brief description should include its shape and numbers describing its center and spread,

based on inspection of the histogram or stemplot

-graphs are aide to understanding no the answer

-measures of center are the mean(average value) and median(middle value)

-to figure out mean: mean . Add their values and divide by the number of observations. If

the n observations are , ,…..,, their mean is

= or in more compact notation: =

is sigma. Is the mean short for add them all up.

: the bar on top indicates the mean of all the x values.

: keep the n observations separate. Not necessarily indicate order or any other special

facts about the data

-the mean is sensitive to the influence of a few extreme observations ex. outliers. Since

mean can’t resist the influence of extreme values, it’s not a resistant measure of center.

-median: formal version of midpoint of a distribution. Half the observations are smaller

than the median and the other half are larger than the median. Rule for finding the

median:

1. arrange all values in order of size, from smallest to largest.

2. if the number of observations n is odd, the median M is the center value in the ordered

list. Find the location of the median by counting (n + 1)/2 observations up from the bottom

of the list

3. if the number of observations n is even, the median M is the mean of the two center

observations in the ordered list. The location of the median is again (n+1)/2 from the bottom

of the list.

- if the distribution is exactly symmetric, the mean and median are exactly the same

-don’t confuse the “average” value of a variable (the mean) with its “typical” value, which we

might describe by the median

-quartiles: elaborate more on the spread or variability of the incomes and drug potencies as

well as their centers.

-most useful descriptions explain both a measure of center and measure of spread

-describe spread or variability, by giving several percentiles

-median divides the data in two, we call the median the 50th percentile. Upper quartile is

the median of the upper half of the data. (same for the lower quartile, lower half)

-quartiles divide the data into 4 equal parts

-pth percentile of a distribution is the value that has p percent of the observations fall at or

below it

-to calculate percentile, arrange values in increasing order and count up the required

percent from the bottom of the list. There is not always a value with exactly p percent of the

data at or below it.

-quartiles Q1 and Q3: to calculate the quartiles:

1. arrange values in increasing order and locate median M in the ordered list.

2. first quartile Q1 is the median of the values whose position in the ordered list is to the

left of the location of the overall median.

3. third quartile relates the median on the right.

www.notesolution.com