# Detailed textbook notes

Unlock Document

University of Toronto Mississauga

Biology

BIO360H5

Helene Wagner

Winter

Description

Chapter 3: Displaying and Describing Categorical Data Bar graphs Pie graphsSegmented Bar Graphs display categorical data (not quantitative!) - Pie graphs and Segmented Bar Graphs must add up to 100% - Simpsons Paradox: when averages are taken across different groups, they can appear to contradict the overall averages Marginal Distribution: divide row and column totals by grand total. (doesnt tell us anything about the other variable Joint Distribution: divide each cell by grand total Conditional Distribution: divide each cell by column total (or by row total, depending on the research question. In a contingency table, if the conditional distribution of one variable is the same for all category of another, the variables are independent Chapter 4: Displaying and Summarizing Quantitative Data Chapter 5: Understanding and Comparing Distributions Histogram: shows distribution of a quantitative variable (each bar represents the frequency or relative frequency of values falling in each bin); no gaps unlike bar graph! The five-number summary reports its median, quartiles and extremes (maximum and minimum) Max , Q3, Median, Q1, Min all summarized in a boxplot: used for QUANTITATIVE DATA (longer boxplot means more variability in data) If the histogram is symmetric and no outliers -> use mean for measure of center and standard deviation for measure of spread Standard deviation measures how far each data value is from the mean - the square root of the variance s= (y y) n1 If the histogram is skewed or has outliers -> use median and Interquartile Range (IQR) IQR = Q3 Q1 (The IQR contains 50% of the data values) Potential outliers: if beyond 1.5 x IQR from either end of the box Chapter 6: The Standard Deviation as a ruler and the Normal Model Standard Normal Model (z-scores) the distance of each standard deviation from the mean measured in units of standard deviations. Standardizing data into z-scores does not change the shape of the distribution. Center becomes: mean= 0 and the spread: SD=1 N(0,1) www.notesolution.com - Normal models are appropriate for distributions whose shape are unimodal and roughly symmetric z score = observed mean standard deviation z = y When comparing two z-scores: The larger a z-score (negative or positive), the more unusual it is. If asked which mean is more likely, choose the lower z-score Negative z-score = data value is below the mean Positive z-score = data value is above the mean 68- 95- 99.7 Rule 68% of values lie within 1 standard deviation from the mean, 95% within 2 and 99.7% within 3. Normal probability (quantile) plot (qq plot): checks for Nearly Normal Condition; straight line indicates normal distribution (unimodal and roughly symmetric) Histogram Symmetrical and unimodal: boxplot has smaller SD and smaller IQR Histogram rightskewed: mean is larger than the median (mean is torn toward the tail) from looking at the boxplot - upper quartile (Q3) is farther from the median than the lower quartile (Q1) since more data values in the lower quartile from looking at normal probability plot hockey stick shape pointing left Histogram leftskewed : median is larger than the mean (more data values in upper quartile) Boxplot: upper quartile is closer to the median than lower quartile (see quiz 5 #2) The median is closer up to the max while there is a large gap between the min and the median. Ztable: always gives lower end probabilities. Ex. If asked find how long the longest 20% of pregnancies last, use p= 1 0.2 = 0.8 Ex.2: what percent will last at least 300 days (meaning >300 days!)?find z then use p=1-_ Chapter 7: Scatterplots, Association and Correlation Scatterplot: relationship between two quantitative variables; describe by direction (+ve or ve), form (linear or non-linear) and strength (amount of scatter) x (predictor or explanatory variable), y (response variable) P.E. -> result have to run Correlation Coefficient (r) measure the strength of linear association between two quantitative variables. Appropiate to use correlation in scatterplot if linear and no outliers www.notesolution.com

More
Less
Related notes for BIO360H5