# ENVS178 Lecture Notes - Lecture 5: Waskesiu Lake, Environment And Climate Change Canada, Statistical Inference

by OC764244

School

University of WaterlooDepartment

Environmental StudiesCourse Code

ENVS178Professor

Jeff CaselloLecture

5This

**preview**shows pages 1-3. to view the full**28 pages of the document.**LOOKING AT DATA RELATIONSHIPS:

CORRELATION AND REGRESSION

De Veaux et al., Chapters 7 and 8

Sometimes, data analysis is defined by the number of variables considered. Here are several

cases:

Univariate analysis : »

Bivariate analysis : »

Multivariate analysis : »

In bivariate and multivariate analysis, you are usually interested in showing that one variable

helps to explain another. In this case special terms are used.

Response variable : »

Explanatory variable : »

Trying to estimate relationships amongst variables is often best done using visual inspection.

One tool to help is known as a “scatterplot”

Scatterplots:

Each dot represents: »

The scatter of dots represents: »

Example of a scatterplot. Both plots illustrate the same data; note the effect of scale.

Source of data: Canadian Global Almanac 2000, Macmillan Canada

Male versus Female Death Rates in Canada

between 1921 and 1997

0

2

4

6

8

10

12

0 5 10 15

Male deaths per 1000 population

Female deaths per 1000

population

Male versus Female Death Rates in Canada

between 1921 and 1997

4

5

6

7

8

9

10

11

12

6 7 8 9 10 11 12 13

Male deaths per 1000 population

Female deaths per 1000

population

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

Which dots represent which years?

In the above example, the direction of the scatter is clear: as the values for one variable increase

(or decrease), so do the values for the second variable. This is called

»

By contrast, if the values of one variable increased as the values of the other variable decreased,

this is referred to as

A scatterplot also provides information on the form of the relationship.

• Linear relationship: »

• Curvilinear relationship: »

What is the form of the relationship between male death rate and female death rate in Canada

over time?

You will have note that there was little scatter in the death rate plot and this says something

about the strength of the relationship.

What would the following data sets look like graphically?

• Strong linear relationship

»

• Moderately strong linear relationship

»

• Weak relationship (form and direction unclear)

»

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

Correlation Analysis

Correlation analysis was invented by Sir Francis Galton (1822-1911), while he was thinking

about the resemblance between parents and children. Statisticians in Victorian England were

fascinated by family resemblances and gathered data on the topic.

One such hypothetical data set, which provides the adult height of 36 paired fathers and sons, is

depicted below. I have superimposed the best-fit line on the scatter. Note how it compares to the

line where X=Y.

What do each of the following represent in the above scatterplot?

• each dot »

• the x-co-ordinate of the dot (on the horizontal axis) (explanatory variable) »

• the y-co-ordinate of the dot (on the vertical axis) (response variable) »

What does the 45 degree line tell us? »

Where is the best-fit line positioned? »

Using the best-fit line, can we accurately predict a son's height based solely on the father's

height? »

###### You're Reading a Preview

Unlock to view full version