Chapter 2: Descriptive Statistics
Describing the Shape of a Distribution
Descriptive statistics: The science of describing the important characteristics of a population or sample
o Central tendency: Middle off the data set
o Variability: Spread of the data
o Shape: Distribution of the data set over various values
o Outliers: An unusually large or small data that is far off from the rest of the data set
Graphical methods: Methods of depicting data sets to study relationships between different variables
Stem-and-leaf display (Pg 26): Displays an overall pattern in the data, by group it into classes
o Shows the variation from class to class, and the amount, and distribution of data in each class
o Best for small to moderately sized data distributions
o Steps to creating a stem-and-leaf display:
1. Decide which unit will be used for the stems and the leaves. Choose units for the stems so that there
will be somewhere between 5 and 20 stems.
2. Place the stems in a column with the smallest stem at the top column and the largest at the bottom.
3. Enter the leaf for each measurement into the row corresponding to the proper stem. The leaves should
be single digit numbers (rounded values if originally more than one).
4. Rearrange the leaves so that they are in increasing order from left to right.
Frequency distribution (pg 27 – 32): A table that groups data into particular classes defined by a stem
o Frequency: The number of a class defined by a stem
o Histogram: A graphical portrayal of a data set that shows the data set’s distribution
o Steps to creating a histogram:
1. Find the number of classes.
Number of classes should be the smallest whole number ‘k’ that makes the quantity 2k
greater than the number of measurements.
2. Find the class length.
3. Form non-overlapping classes of equal width.
Lower boundary of the 1 class: smallest data value
Lower boundary of 2 classes: upper boundary of the last class
Upper boundary of any classes: lower boundary of class + class length
The last class may be an open class, with no upper boundary.
4. Tally and count the number of measurements in each class.
Frequency: The number of measurements in each class
Relative frequency (percent): Proportion of the total number of measurements in the class
Relative frequency distribution: List of all data classes and their relative frequencies 5. Graph the histogram
Plot each (relative) frequency as the height of rectangle positioned over corresponding class.
The x-axis can consist of upper and lower class boundaries, or class midpoints.
Use the class boundaries to separate adjacent rectangles.
o Normally distributed: Symmetrical bell-shaped normal curve
o Positively skewed: With a tail to the right
o Negatively skewed: With a tail to the left
Dot Plots (pg 33 – 34): A number line with each data value represented above the corresponding scale value
o Useful for detecting outliers (along with stem and leaf displays)
Describing Central Tendency
o A constant value calculated from all the population measurements that describes an aspect of the population
o Central tendency: The center, or middle, of the data set
o Point estimate: One-number estimate of the value of a population parameter
Sample statistics: Number calculated using the sample measurements that describes some aspect of the sample.
o Since measuring all population units is difficult, samples and estimates are used.
o A descriptive measure of the sample.
Population mean (μ): Average of the population measurements
o Calculated by adding all the population measurements, and dividing the sum by the number of measurements
o Constant value
Sample mean (x-bar): Average of the sample measurements
o ∑ (where n = sample size, x = sample measurements)
o Is the point estimate of the population mean, and is a random variable
Median (M ): Measurement that divides a population or sample into roughly equal parts.
o Arrange the measurements of a population or sample in increasing order
o If the number of measurements is odd, median is the middle measurement in the ordering
o If the number of measurements is even, median is the average of the two middle measurements in the ordering
o More resistant to outliers, and is therefore a better choice of measuring centrality
Mode (M o: Measurement that occurs most frequently in a population or sample
o Bimodal: Exactly two modes
o Multimodal: More than two modes
o When the curve is bell-shaped: mean = median = mode
o When the curve is right skewed: mean > median > mode
o When the curve is left skewed: mean < median < mode Measurements of Variation
Range: The interval spanned by all of data
o Largest measurement – smallest measurement
o Poor measure of variance, as extreme measurements may not be entirely representative of the data set
Population variance (ϭ ): Average of the squared deviation of the population measurements from the population mean μ.
o (where N = population size)