Study Guides (258,664)
CA (124,963)
UTM (5,133)
Biology (583)
BIO360H5 (1)

Detailed textbook notes

12 Pages
190 Views

Department
Biology
Course Code
BIO360H5
Professor
Helene Wagner

This preview shows pages 1-3. Sign up to view the full 12 pages of the document.
Chapter 3: Displaying and Describing Categorical Data
Bar graphs / Pie graphs/Segmented Bar Graphs – display categorical data (not quantitative!)
-Pie graphs and Segmented Bar Graphs must add up to 100%
-Simpsons Paradox: when averages are taken across different groups, they can appear to
contradict the overall averages
Marginal Distribution: divide row and column totals by grand total. (doesnt tell us anything
about the other variable
Joint Distribution: divide each cell by grand total
Conditional Distribution: divide each cell by column total (or by row total, depending on the
research question. – In a contingency table, if the conditional distribution of one variable is the
same for all category of another, the variables are independent
Chapter 4: Displaying and Summarizing Quantitative Data
Chapter 5: Understanding and Comparing Distributions
Histogram: shows distribution of a quantitative variable (each bar represents the frequency or
relative frequency of values falling in each bin); no gaps unlike bar graph!
The five-number summary reports its median, quartiles and extremes (maximum and minimum)
Max , Q3, Median, Q1, Min – all summarized in a boxplot: used for QUANTITATIVE DATA
(longer boxplot means more variability in data)
If the histogram is symmetric and no outliers -> use mean for measure of center and standard
deviation for measure of spread
Standard deviation – measures how far each data value is from the mean
- the square root of the variance s = (y- y)2/ n-1
If the histogram is skewed or has outliers -> use median and Interquartile Range (IQR)
IQR = Q3 – Q1 (The IQR contains 50% of the data values)
Potential outliers: if beyond 1.5 x IQR from either end of the box
Chapter 6: The Standard Deviation as a ruler and the Normal Model
Standard Normal Model (z-scores) – the distance of each standard deviation from the mean
measured in units of standard deviations. Standardizing data into z-scores does not change the
shape of the distribution. Center becomes: mean= 0 and the spread: SD=1 N(0,1)
www.notesolution.com
-Normal models are appropriate for distributions whose shape are unimodal and roughly
symmetric
z- score = observed – mean / standard deviation z = y - µ /
When comparing two z-scores: The larger a z-score (negative or positive), the more unusual it is.
If asked which mean ismore likely”, choose the lower z-score
Negative z-score = data value is below the mean
Positive z-score = data value is above the mean
68- 95- 99.7 Rule – 68% of values lie within 1 standard deviation from the mean, 95% within 2
and 99.7% within 3.
Normal probability (quantile) plot (qq- plot): checks for Nearly Normal Condition; straight
line indicates normal distribution (unimodal and roughly symmetric)
Histogram Symmetrical and unimodal : boxplot has smaller SD and smaller IQR
Histogram right-skewed : mean is larger than the median (mean is torn toward the tail)
from looking at the boxplot - upper quartile (Q3) is farther from the median than the lower
quartile (Q1) since more data values in the lower quartile
from looking at normal probability plot – hockey stick shape pointing left
Histogram left-skewed : median is larger than the mean (more data values in upper quartile)
Boxplot: upper quartile is closer to the median than lower quartile (see quiz 5 #2)
The median is closer up to the max while there is a large gap between the min and the median.
Z-table : always gives lower end probabilities. Ex. If asked find how long the longest
20% of pregnancies last, use p= 1 – 0.2 = 0.8
Ex.2: what percent will last at least 300 days (meaning >300 days!)?find z then use p=1-_
Chapter 7: Scatterplots, Association and Correlation
Scatterplot: relationship between two quantitative variables; describe by direction (+ve or –ve),
form (linear or non-linear) and strength (amount of scatter)
x (predictor or explanatory variable), y (response variable) P.E. -> result have to run
Correlation Coefficient (r) – measure the strength of linear association between two quantitative
variables. Appropiate to use correlation in scatterplot if linear and no outliers
www.notesolution.com
Correlation is 1) not affected by rescaling 2) has no units 3.) sensitive to outliers 4.) always
between the values -1 and 1
-Even if there is a relationship (high correlation coefficient) it doesnt prove causation
-Correlation always has the same sign as the slope of the regression
Spearmans Rho – for bent relationships; For each variable, replace values by ranks and
calculate correlation between ranks.
If non-linear, can use transformations to re-express the data make correlations more linear
Chapter 8: Linear Regression
Linear model
Response = Intercept + Slope* predictor + Error (Residual) term : y= bo + b1x + e
-The line minimizes the sum of the squared vertical distances from the points to the line
-Predictions based on the regression line are for average values of y for a given x. The
actual ‘y’ will vary around the predicted value
Slope : means with every 1 increase in x, the response y is expected to increase by b1 units of y
Y-intercept: means that with no x, y has the value of the y-intercept.
-b1 = (r)sy/ sx where r = correlation sy = standard deviation of y variable b1 = slope
R 2
(squared correlation) fraction of the datas variance accounted for by the linear model.
X variable explains __% (R-Sq value) of the variation in Y variable. However, individual
predictions may still be far off.
-If you change the units of the y-variable: the slope of the regression changes (b/c units
have changed) but the correlation and the R 2
value does not change
Residuals = observed value (collected data) - predicted value : ei = yi - ŷi
- negative residual: observed < predicted
A scatterplot of residuals versus the x-values (residual plot) should be a boring scatterplot. It
shouldnt show any direction, shape, bends or outliers. It should stretch horizontally with about
the same amount of scatter throughout. (see pg.204 #3)
www.notesolution.com

Loved by over 2.2 million students

Over 90% improved by at least one letter grade.

Leah — University of Toronto

OneClass has been such a huge help in my studies at UofT especially since I am a transfer student. OneClass is the study buddy I never had before and definitely gives me the extra push to get from a B to an A!

Leah — University of Toronto
Saarim — University of Michigan

Balancing social life With academics can be difficult, that is why I'm so glad that OneClass is out there where I can find the top notes for all of my classes. Now I can be the all-star student I want to be.

Saarim — University of Michigan
Jenna — University of Wisconsin

As a college student living on a college budget, I love how easy it is to earn gift cards just by submitting my notes.

Jenna — University of Wisconsin
Anne — University of California

OneClass has allowed me to catch up with my most difficult course! #lifesaver

Anne — University of California
Description
Chapter 3: Displaying and Describing Categorical Data Bar graphs Pie graphsSegmented Bar Graphs display categorical data (not quantitative!) - Pie graphs and Segmented Bar Graphs must add up to 100% - Simpsons Paradox: when averages are taken across different groups, they can appear to contradict the overall averages Marginal Distribution: divide row and column totals by grand total. (doesnt tell us anything about the other variable Joint Distribution: divide each cell by grand total Conditional Distribution: divide each cell by column total (or by row total, depending on the research question. In a contingency table, if the conditional distribution of one variable is the same for all category of another, the variables are independent Chapter 4: Displaying and Summarizing Quantitative Data Chapter 5: Understanding and Comparing Distributions Histogram: shows distribution of a quantitative variable (each bar represents the frequency or relative frequency of values falling in each bin); no gaps unlike bar graph! The five-number summary reports its median, quartiles and extremes (maximum and minimum) Max , Q3, Median, Q1, Min all summarized in a boxplot: used for QUANTITATIVE DATA (longer boxplot means more variability in data) If the histogram is symmetric and no outliers -> use mean for measure of center and standard deviation for measure of spread Standard deviation measures how far each data value is from the mean - the square root of the variance s= (y y) n1 If the histogram is skewed or has outliers -> use median and Interquartile Range (IQR) IQR = Q3 Q1 (The IQR contains 50% of the data values) Potential outliers: if beyond 1.5 x IQR from either end of the box Chapter 6: The Standard Deviation as a ruler and the Normal Model Standard Normal Model (z-scores) the distance of each standard deviation from the mean measured in units of standard deviations. Standardizing data into z-scores does not change the shape of the distribution. Center becomes: mean= 0 and the spread: SD=1 N(0,1) www.notesolution.com - Normal models are appropriate for distributions whose shape are unimodal and roughly symmetric z score = observed mean standard deviation z = y When comparing two z-scores: The larger a z-score (negative or positive), the more unusual it is. If asked which mean is more likely, choose the lower z-score Negative z-score = data value is below the mean Positive z-score = data value is above the mean 68- 95- 99.7 Rule 68% of values lie within 1 standard deviation from the mean, 95% within 2 and 99.7% within 3. Normal probability (quantile) plot (qq plot): checks for Nearly Normal Condition; straight line indicates normal distribution (unimodal and roughly symmetric) Histogram Symmetrical and unimodal: boxplot has smaller SD and smaller IQR Histogram rightskewed: mean is larger than the median (mean is torn toward the tail) from looking at the boxplot - upper quartile (Q3) is farther from the median than the lower quartile (Q1) since more data values in the lower quartile from looking at normal probability plot hockey stick shape pointing left Histogram leftskewed : median is larger than the mean (more data values in upper quartile) Boxplot: upper quartile is closer to the median than lower quartile (see quiz 5 #2) The median is closer up to the max while there is a large gap between the min and the median. Ztable: always gives lower end probabilities. Ex. If asked find how long the longest 20% of pregnancies last, use p= 1 0.2 = 0.8 Ex.2: what percent will last at least 300 days (meaning >300 days!)?find z then use p=1-_ Chapter 7: Scatterplots, Association and Correlation Scatterplot: relationship between two quantitative variables; describe by direction (+ve or ve), form (linear or non-linear) and strength (amount of scatter) x (predictor or explanatory variable), y (response variable) P.E. -> result have to run Correlation Coefficient (r) measure the strength of linear association between two quantitative variables. Appropiate to use correlation in scatterplot if linear and no outliers www.notesolution.com
More Less
Unlock Document


Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

Unlock Document
You're Reading a Preview

Unlock to view full version

Unlock Document

Log In


OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.


Submit