Textbook Notes (380,873)
CA (168,244)
UTSC (19,296)
Statistics (135)
STAB22H3 (130)
Moras (15)
Chapter 2

Chapter 2

7 Pages
68 Views

Department
Statistics
Course Code
STAB22H3
Professor
Moras

This preview shows pages 1-2. Sign up to view the full 7 pages of the document.
Chapter 2- Looking at Data- Relationships
-associated: term used to describe the relationship between two variables ex. breed and
life span
-examining relationships:
What individuals or cases do the data describe?
What variables are present? How are they measured?
Which variables are quantitative and which are categorical?
Ex. on page 85
-response variable: measures an outcome of a study
-explanatory variable: explains or causes changes in the response variables. Ex. many of
these do not involve direct causation. Ex. sat scores of high school students help predict
future college grades but high sat scores dont CAUSE high college grades.
-independent variables: called explanatory variables
-dependent variables: called response variables
Response variables rely on explanatory variables
2.1- Scatterplots
-scatterplots: for showing relationship between two quantitative variables measured on
the same individuals.
-explanatory variable(s) on x axis called x. (if no explanatory variable, then any of the
variables can on either axis)
Response variable on y axis called y.
-Interpreting scatterplots:
Look for overall pattern and deviations from pattern ex. outliers- falls outside the pattern of
the relationship.
Describe overall pattern by the form, direction, and strength of the relationship.
-form: ex. clusters
Pg. 87 fig. 2.1 has two clusters
Clusters: groups of points on the graph. They suggest that the data describe several
distinct kinds of individuals.
-positive associated: when two variables are above average values of one tend to
accompany above average values of the other and below average values also tend to occur
together
-negatively associated: when two variables are above average values of one accompany
below average values of the other and vice versa.
-linear relationship: points roughly follow a straight line
Strength of relationship: determined by how closely the points follow a clear form
-to add a categorical variable to a scatterplot, use a different plot colour or symbol for each
category
-smoothing: systematic methods of extracting the overall pattern are helpful. They use
resistant calculations so they are not affected by outliers in the plot.
www.notesolution.com
-to display a relationship between a categorical explanatory variable and a quantitative
response variable, make a side by side comparison of the distributions of the response for
each category.
2.2- Correlation
-measure used for data analysis by using a numerical measure to supplement the graph.
(since our eyes are not good judges of how strong a relationship is)
-correlation r: helps us see that r is positive when there is a positive association between
the variables. Ex. height and weight have a positive correlation.
Correlation: measures the direction and strength of the linear relationship between two
quantitative variables. It is usually written as r.
Ex. suppose data on variable x and y for n individuals. The means and standard deviations
of the two variables are and for the x values and and for the y values. The
correlation r between x and y is
    
means : add these terns for all the individuals
This formula helps us see what correlation is but is not convenient for actually calculating r.
the beginning of this formula starts by standardizing the observations.
-ex.        
    
is the standardized height of the ith person. The standardized height says how many SD
above or below the mean a persons height lies. Standardized values have no units, they
have no longer measured in centimeters. The correlation r is an average of the products of
the standardized height and the standardized weight for the n people.
-properties of correlation:
Correlation for the following:
Doesnt make a difference what you make the x or y variable when calculating the
correlation
Requires that both variables be quantitative, so that it makes sense to do the
arithmetic indicated by the formula for r. ex. city cant be calculated bc its
categorical.
Because r uses the standardized values of the observations, r does not change when
we change the units of measurement of x, y, or both. Ex. using weight and height.
Cm -> inches or kg -> lbs. doesnt change the correlation between weight and height.
Correlation r has no unit of measurement
Positive r indicates positive association between the variables and negative r
indicates negative association
Correlation r is always a number between -1 and 1. Values of r near 0 means a very
weak linear relationship. Strength of relationship increases as r moves away from 0
toward either -1 or 1. Values of r close -1 or 1 means that the points lie close to a
straight line. The extreme values
r=-1 and r= 1 occur only when the points in a scatterplot lie exactly along a straight line.
Measures the strength of only the linear relationship between two variables.
Correlation does not describe curved relationships between variables, no matter how
strong they are.
www.notesolution.com

Loved by over 2.2 million students

Over 90% improved by at least one letter grade.

Leah — University of Toronto

OneClass has been such a huge help in my studies at UofT especially since I am a transfer student. OneClass is the study buddy I never had before and definitely gives me the extra push to get from a B to an A!

Leah — University of Toronto
Saarim — University of Michigan

Balancing social life With academics can be difficult, that is why I'm so glad that OneClass is out there where I can find the top notes for all of my classes. Now I can be the all-star student I want to be.

Saarim — University of Michigan
Jenna — University of Wisconsin

As a college student living on a college budget, I love how easy it is to earn gift cards just by submitting my notes.

Jenna — University of Wisconsin
Anne — University of California

OneClass has allowed me to catch up with my most difficult course! #lifesaver

Anne — University of California
Description
Chapter 2- Looking at Data- Relationships -associated: term used to describe the relationship between two variables ex. breed and life span -examining relationships: What individuals or cases do the data describe? What variables are present? How are they measured? Which variables are quantitative and which are categorical? Ex. on page 85 -response variable: measures an outcome of a study -explanatory variable: explains or causes changes in the response variables. Ex. many of these do not involve direct causation. Ex. sat scores of high school students help predict future college grades but high sat scores dont CAUSE high college grades. -independent variables: called explanatory variables -dependent variables: called response variables Response variables rely on explanatory variables 2.1- Scatterplots -scatterplots: for showing relationship between two quantitative variables measured on the same individuals. -explanatory variable(s) on x axis called x. (if no explanatory variable, then any of the variables can on either axis) Response variable on y axis called y. -Interpreting scatterplots: Look for overall pattern and deviations from pattern ex. outliers- falls outside the pattern of the relationship. Describe overall pattern by the form, direction, and strength of the relationship. -form: ex. clusters Pg. 87 fig. 2.1 has two clusters Clusters: groups of points on the graph. They suggest that the data describe several distinct kinds of individuals. -positive associated: when two variables are above average values of one tend to accompany above average values of the other and below average values also tend to occur together -negatively associated: when two variables are above average values of one accompany below average values of the other and vice versa. -linear relationship: points roughly follow a straight line Strength of relationship: determined by how closely the points follow a clear form -to add a categorical variable to a scatterplot, use a different plot colour or symbol for each category -smoothing: systematic methods of extracting the overall pattern are helpful. They use resistant calculations so they are not affected by outliers in the plot. www.notesolution.com
More Less
Unlock Document


Only pages 1-2 are available for preview. Some parts have been intentionally blurred.

Unlock Document
You're Reading a Preview

Unlock to view full version

Unlock Document

Log In


OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit