Department

Biology

Course Code

BIO360H5

Professor

Helene Wagner

Chapter 3: Displaying and Describing Categorical Data

Bar graphs / Pie graphs/Segmented Bar Graphs – display categorical data (not quantitative!)

-Pie graphs and Segmented Bar Graphs must add up to 100%

-Simpson’s Paradox: when averages are taken across different groups, they can appear to

contradict the overall averages

Marginal Distribution: divide row and column totals by grand total. (doesn’t tell us anything

about the other variable

Joint Distribution: divide each cell by grand total

Conditional Distribution: divide each cell by column total (or by row total, depending on the

research question. – In a contingency table, if the conditional distribution of one variable is the

same for all category of another, the variables are independent

Chapter 4: Displaying and Summarizing Quantitative Data

Chapter 5: Understanding and Comparing Distributions

Histogram: shows distribution of a quantitative variable (each bar represents the frequency or

relative frequency of values falling in each bin); no gaps unlike bar graph!

The five-number summary reports its median, quartiles and extremes (maximum and minimum)

Max , Q3, Median, Q1, Min – all summarized in a boxplot: used for QUANTITATIVE DATA

(longer boxplot means more variability in data)

If the histogram is symmetric and no outliers -> use mean for measure of center and standard

deviation for measure of spread

•Standard deviation – measures how far each data value is from the mean

- the square root of the variance s = √ ∑(y- y)2/ n-1

If the histogram is skewed or has outliers -> use median and Interquartile Range (IQR)

IQR = Q3 – Q1 (The IQR contains 50% of the data values)

Potential outliers: if beyond 1.5 x IQR from either end of the box

Chapter 6: The Standard Deviation as a ruler and the Normal Model

Standard Normal Model (z-scores) – the distance of each standard deviation from the mean

measured in units of standard deviations. Standardizing data into z-scores does not change the

shape of the distribution. Center becomes: mean= 0 and the spread: SD=1 N(0,1)

www.notesolution.com

-Normal models are appropriate for distributions whose shape are unimodal and roughly

symmetric

z- score = observed – mean / standard deviation z = y - µ /

When comparing two z-scores: The larger a z-score (negative or positive), the more unusual it is.

If asked which mean is “more likely”, choose the lower z-score

Negative z-score = data value is below the mean

Positive z-score = data value is above the mean

68- 95- 99.7 Rule – 68% of values lie within 1 standard deviation from the mean, 95% within 2

and 99.7% within 3.

Normal probability (quantile) plot (qq- plot): checks for Nearly Normal Condition; straight

line indicates normal distribution (unimodal and roughly symmetric)

Histogram Symmetrical and unimodal : boxplot has smaller SD and smaller IQR

Histogram right-skewed : mean is larger than the median (mean is torn toward the tail)

from looking at the boxplot - upper quartile (Q3) is farther from the median than the lower

quartile (Q1) since more data values in the lower quartile

from looking at normal probability plot – hockey stick shape pointing left

Histogram left-skewed : median is larger than the mean (more data values in upper quartile)

Boxplot: upper quartile is closer to the median than lower quartile (see quiz 5 #2)

The median is closer up to the max while there is a large gap between the min and the median.

•Z-table : always gives lower end probabilities. Ex. If asked find how long the longest

20% of pregnancies last, use p= 1 – 0.2 = 0.8

•Ex.2: what percent will last at least 300 days (meaning >300 days!)?find z then use p=1-_

Chapter 7: Scatterplots, Association and Correlation

Scatterplot: relationship between two quantitative variables; describe by direction (+ve or –ve),

form (linear or non-linear) and strength (amount of scatter)

x (predictor or explanatory variable), y (response variable) P.E. -> result have to run

Correlation Coefficient (r) – measure the strength of linear association between two quantitative

variables. Appropiate to use correlation in scatterplot if linear and no outliers

www.notesolution.com

Correlation is 1) not affected by rescaling 2) has no units 3.) sensitive to outliers 4.) always

between the values -1 and 1

-Even if there is a relationship (high correlation coefficient) it doesn’t prove causation

-Correlation always has the same sign as the slope of the regression

Spearman’s Rho – for bent relationships; For each variable, replace values by ranks and

calculate correlation between ranks.

If non-linear, can use transformations to re-express the data make correlations more linear

Chapter 8: Linear Regression

Linear model

Response = Intercept + Slope* predictor + Error (Residual) term : y= bo + b1x + e

-The line minimizes the sum of the squared vertical distances from the points to the line

-Predictions based on the regression line are for average values of y for a given x. The

actual ‘y’ will vary around the predicted value

Slope : means with every 1 increase in x, the response y is expected to increase by b1 units of y

Y-intercept: means that with no x, y has the value of the y-intercept.

-b1 = (r)sy/ sx where r = correlation sy = standard deviation of y variable b1 = slope

R 2

(squared correlation) fraction of the data’s variance accounted for by the linear model.

X variable explains __% (R-Sq value) of the variation in Y variable. However, individual

predictions may still be far off.

-If you change the units of the y-variable: the slope of the regression changes (b/c units

have changed) but the correlation and the R 2

value does not change

Residuals = observed value (collected data) - predicted value : ei = yi - ŷi

- negative residual: observed < predicted

A scatterplot of residuals versus the x-values (residual plot) should be a boring scatterplot. It

shouldn’t show any direction, shape, bends or outliers. It should stretch horizontally with about

the same amount of scatter throughout. (see pg.204 #3)

www.notesolution.com

Over 90% improved by at least one letter grade.

OneClass has been such a huge help in my studies at UofT especially since I am a transfer student. OneClass is the study buddy I never had before and definitely gives me the extra push to get from a B to an A!

Leah — University of Toronto

Balancing social life With academics can be difficult, that is why I'm so glad that OneClass is out there where I can find the top notes for all of my classes. Now I can be the all-star student I want to be.

Saarim — University of Michigan

As a college student living on a college budget, I love how easy it is to earn gift cards just by submitting my notes.

Jenna — University of Wisconsin

OneClass has allowed me to catch up with my most difficult course! #lifesaver

Anne — University of California

Join OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.