Textbook Notes (378,540)
CA (167,150)
UTSC (19,212)
Statistics (135)
STAB22H3 (130)
Moras (15)
Chapter 1

CHAPTER 1 NOTES

6 Pages
159 Views

Department
Statistics
Course Code
STAB22H3
Professor
Moras

This preview shows pages 1-2. Sign up to view the full 6 pages of the document.
Examining a Distributions
-in any graph of data, look at the overall pattern and for dramatic deviations from that
pattern
- describe pattern by its shape, center and spread
-important kind of deviation is an outlier, an individual value that falls outside the overall
pattern
-describe center of distribution by its midpoint, the value with roughly half the observations
taking smaller values and half taking larger values. We can describe the spread of
distribution by giving the smallest and largest values
-describe the spread of distribution by stating the smallest and largest values (Q1, Q3)
Stemplots and histograms display this. Stemplots on its side with the larger value lies to
the right.
-Describing shape:
Does the distribution have one or several major peaks called modes? Unimodal- one
peak
Is it symmetric or skewed? Symmetric- values smaller and larger than its midpoint
are mirrored. Ex. heights of young women. Skewed- tails. Ex. money amounts,
skewed to the right.
-outliers: look for points that are clearly apart from the body of data, not just the most
extreme observations in a distribution. Sometimes they point to errors made in recording
the data.
-time plots (pg. 19): of a variable plots each observation against the time at which it was
measured. Always put the time on x axis (horizontal) scaled of your plot and the variable
you are measuring on y axis. Connecting the points will show change over time. data
collected over time, plot observations in time order. Displays of stemplots of histograms
ignore time order, so it can be misleading when there is systematic change over time.
-time series: measurements of a variable taken at regular intervals over time. Government,
economic, and social data are often published as this. Ex. monthly unemployment rate and
the quarterly gross domestic product. Time plots reveal the main features of a time series.
-in a time series:
Trend: is a persistent, long term rise or fall
Seasonal variation: a pattern that repeats itself at known regular intervals of a time
-many economic time series show strong seasonal variation. Government agencies adjust
this variation before releasing economic data, its called seasonally adjusted (helps avoid
misinterpretation.
-residuals: removing trends and seasonal variation and what remains after the patterns are
removed
-exploratory data analysis: uses graphs and numerical summaries to describe the variables
in a data set and the relations among them
-distribution of a variable- what values and how often it takes these values
1.2- Describing Distributions with Numbers
-numerical summaries make comparisons more specific
www.notesolution.com
-brief description should include its shape and numbers describing its center and spread,
based on inspection of the histogram or stemplot
-graphs are aide to understanding no the answer
-measures of center are the mean(average value) and median(middle value)
-to figure out mean: mean . Add their values and divide by the number of observations. If
the n observations are , ,…..,, their mean is
=     or in more compact notation: = 
is sigma. Is the mean short for add them all up.
: the bar on top indicates the mean of all the x values.
 : keep the n observations separate. Not necessarily indicate order or any other special
facts about the data
-the mean is sensitive to the influence of a few extreme observations ex. outliers. Since
mean cant resist the influence of extreme values, its not a resistant measure of center.
-median: formal version of midpoint of a distribution. Half the observations are smaller
than the median and the other half are larger than the median. Rule for finding the
median:
1. arrange all values in order of size, from smallest to largest.
2. if the number of observations n is odd, the median M is the center value in the ordered
list. Find the location of the median by counting (n + 1)/2 observations up from the bottom
of the list
3. if the number of observations n is even, the median M is the mean of the two center
observations in the ordered list. The location of the median is again (n+1)/2 from the bottom
of the list.
- if the distribution is exactly symmetric, the mean and median are exactly the same
-dont confuse the average value of a variable (the mean) with its typical value, which we
might describe by the median
-quartiles: elaborate more on the spread or variability of the incomes and drug potencies as
well as their centers.
-most useful descriptions explain both a measure of center and measure of spread
-describe spread or variability, by giving several percentiles
-median divides the data in two, we call the median the 50th percentile. Upper quartile is
the median of the upper half of the data. (same for the lower quartile, lower half)
-quartiles divide the data into 4 equal parts
-pth percentile of a distribution is the value that has p percent of the observations fall at or
below it
-to calculate percentile, arrange values in increasing order and count up the required
percent from the bottom of the list. There is not always a value with exactly p percent of the
data at or below it.
-quartiles Q1 and Q3: to calculate the quartiles:
1. arrange values in increasing order and locate median M in the ordered list.
2. first quartile Q1 is the median of the values whose position in the ordered list is to the
left of the location of the overall median.
3. third quartile relates the median on the right.
www.notesolution.com

Loved by over 2.2 million students

Over 90% improved by at least one letter grade.

Leah — University of Toronto

OneClass has been such a huge help in my studies at UofT especially since I am a transfer student. OneClass is the study buddy I never had before and definitely gives me the extra push to get from a B to an A!

Leah — University of Toronto
Saarim — University of Michigan

Balancing social life With academics can be difficult, that is why I'm so glad that OneClass is out there where I can find the top notes for all of my classes. Now I can be the all-star student I want to be.

Saarim — University of Michigan
Jenna — University of Wisconsin

As a college student living on a college budget, I love how easy it is to earn gift cards just by submitting my notes.

Jenna — University of Wisconsin
Anne — University of California

OneClass has allowed me to catch up with my most difficult course! #lifesaver

Anne — University of California
Description
Examining a Distributions -in any graph of data, look at the overall pattern and for dramatic deviations from that pattern - describe pattern by its shape, center and spread -important kind of deviation is an outlier, an individual value that falls outside the overall pattern -describe center of distribution by its midpoint, the value with roughly half the observations taking smaller values and half taking larger values. We can describe the spread of distribution by giving the smallest and largest values -describe the spread of distribution by stating the smallest and largest values (Q1, Q3) Stemplots and histograms display this. Stemplots on its side with the larger value lies to the right. -Describing shape: Does the distribution have one or several major peaks called modes? Unimodal- one peak Is it symmetric or skewed? Symmetric- values smaller and larger than its midpoint are mirrored. Ex. heights of young women. Skewed- tails. Ex. money amounts, skewed to the right. -outliers: look for points that are clearly apart from the body of data, not just the most extreme observations in a distribution. Sometimes they point to errors made in recording the data. -time plots (pg. 19): of a variable plots each observation against the time at which it was measured. Always put the time on x axis (horizontal) scaled of your plot and the variable you are measuring on y axis. Connecting the points will show change over time. data collected over time, plot observations in time order. Displays of stemplots of histograms ignore time order, so it can be misleading when there is systematic change over time. -time series: measurements of a variable taken at regular intervals over time. Government, economic, and social data are often published as this. Ex. monthly unemployment rate and the quarterly gross domestic product. Time plots reveal the main features of a time series. -in a time series: Trend: is a persistent, long term rise or fall Seasonal variation: a pattern that repeats itself at known regular intervals of a time -many economic time series show strong seasonal variation. Government agencies adjust this variation before releasing economic data, its called seasonally adjusted (helps avoid misinterpretation. -residuals: removing trends and seasonal variation and what remains after the patterns are removed -exploratory data analysis: uses graphs and numerical summaries to describe the variables in a data set and the relations among them -distribution of a variable- what values and how often it takes these values 1.2- Describing Distributions with Numbers -numerical summaries make comparisons more specific www.notesolution.com
More Less
Unlock Document


Only pages 1-2 are available for preview. Some parts have been intentionally blurred.

Unlock Document
You're Reading a Preview

Unlock to view full version

Unlock Document

Log In


OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit