Lecture 3

ST 260 Lecture 3: Stats Chapter 3 Notes

University of Alabama
ST 260

Stats Chapter 3 Notes Numerical Summaries Measures of Locations Introduction • Population of interest: things we wish to learn about • Key characteristics: typical value of variation • Parameters: true values for a population Key Features of Data Distributions • Shape • Typical Value • Spread • Outliers First Principles • Numerical summaries should quantify key characteristics of a data set Measures of Location/Center • Mean: works best with symmetric distributions ➢ Sum of the data values divided by the number of data values • Median: skewed distributions or distributions with outliers ➢ Middle value in the ordered data set • Mode: categorical variables ➢ The most frequently occurring value • Trimmed Mean: skewed distributions or distributions with outliers ➢ Average of data values omitting the extremes Key Concepts • Relationship between statistics and parameters • Select appropriate numerical methods • Calculate common numerical summaries of data Measures of Variation Objectives • Calculate common numerical summaries of data • Select appropriate numerical methods • Describe the relationship between statistics and parameters Measures of Variance • Sample Variance: symmetric distributions ➢ Total of all (X – average of X)^2 / (# of X’s – 1) • Sample Standard Deviation: symmetric distributions • Data: One variable – continuous quantitative • Range: distributions without outliers ➢ Maximum value – minimum value • Interquartile Range: distributions with outliers Five-Number Summary The Five Number Summary • Minimum (min): Smallest data value • First Quartile (Q1): Upper boundary for lowest 25% of data values • Second Quartile (Q2 or median): 50 percentile • Third Quartile (Q3): Lower boundary for largest 25% of data values • Maximum (max): Largest data value Measures of Association Objectives • Calculate sample variance and correlation • Relate numerical summaries with graph Correlation • Correlation is a measure of linear association • Not necessarily causation – cause and effect • Just because two variables are highly correlated, it does not mean that one variable is the cause of the other Correlation Coefficient • Takes on values between -1 and +1 • Values near -1  strong negative linear correlation • Values near +1  strong positive linear correlation • Correlation near 0  weak linear relationship
