AMS 5 Lecture 4: Class 4 - More Histograms

6 Pages
Unlock Document

Applied Math and Statistics
Yonatan Katznelson

AMS 5 Lecture 4 4/10/2017 (8:00-9:05) Continuing from Friday: How to Construct Histograms Example: Histogram for US household incomes from 2015 Table: Income Level Frequency Relative Frequency $0 - $14,999 14,595,004 11.6% $15,000 - 24,999 13,210,995 10.5% $25,000 - $34,999 12,581,900 10% $35,000 - $49,999 15,979,013 12.7% $50,000 - $74,999 21,011,773 16.7% $75,000 - $99,999 15,224,099 12.1% $100,000 - $149,999 17,740,479 14.1% $150,000 - $199,999 7,800,778 6.2% $200,000 and over 7,674,959 6.1%  Example: Starting with the table of income distribution we saw earlier, we first draw the horizontal axis… 0 50 100 150 200 250 …Using a density scale, we draw rectangles over each class interval whose areas equal the percentages of the families in those intervals.  Note: The height of each rectangle is equal to the percentage of the observations in the corresponding class interval divided by the length of the class interval (the width of the rectangle)  Next we divide all of the frequency numbers by their range 11.6/15 = .793 10.5/10 = 1.05 10/10 = 1 12.7/15 = .8966 … |.8 | |.6 | |.4 | |.2 | 0 50 100 150 200 250  The vertical scale here is percent per $1000 – i.e., it is the relative frequency (percentage) divided by the width of the intervals (which in this case are measured in $1000s). It’s always a good idea to label the axes.  Why density scale instead of percentages/frequency scale? o The size of the bars then would just be misleading. The bins for the higher incomes seem to be much bigger than the bins o If bins have different widths – use the density scale  Comment: If all the bins in the distribution have the same width, then the appearance of the histogram will be the same for all three scales. Only the units (and numbers) on the vertical scale will change.  Example: Distribution of coal (by weight) in Christmas stockings of 40 children at Wool’s orphanage.  In this case the density scale and frequency scale are equal since the intervals are the same length, the only difference is what is measured by the y-axis Statistics and parameters  Tables, histograms, and other charts are used to summarize large amounts of data. Often, an even more extreme summary is desirable. o A number that summarizes population data is called a parameter. o A number that summarizes sample data that is called a statistic. Average, Mean, Median  Average = mean  Median = middle Observations  Population parameters are (more or less) constant.  Sample statistics vary with the sample, i.e., their values depend on the particular sample chosen. A sample statistic can be thought of as a variable. o Ex: Find the average income for 10,000 households. Depending on the 10,000 you collect data on, the number will change.  Sample statistics are known because we can compute them from the (available) sample data, while population parameters are often unknown, because data for the entire population is often unavailable.  One of the most common uses of sample statistics is to estimate population parameters. o If you compute average of sample size (assuming data is collected correctly), it should be similar to the larger population. Quartiles 1. Number that separates the lowest 25% from the highest 75%. 2. Median, separates the data in the lowest 50% from the highest 50% 3. Number that separates the lowest 75% from highest 25%  You could have more intervals (or “quartiles”), but quartiles are more commonly used. Measures of central tendency  The most extreme way to summarize a list of numbers is with a single, typical value. The most common choices are the mean and median. o The mean (average) of a set of numbers is the sum of all the values divided by the number of values in the set. o The median of a set of number is the middle number, when the numbers are listed in increasing (or decreasing) order. The median splits the data into two equally sized sets—50% of the data lies below the median and 50% lies above. o If the number of numbers in the set is even, then the median will be the average of the middle two values.  The mean and median are different ways of describing the center of the data. Another statistic that is often used to describe the typical value is the mode, which is the most frequently occurring value in the data.  Example: Find the mean, median, and mode of the following set of numbers: {12, 5, 6, 8, 12, 17, 7, 6, 14, 6, 5, 16} o The mean (average) 12+5+6+8+12+17+7+6+14+6+5+16 12 o The median. Arrange the data in ascending order, and find the average of the middle two values in this case, since there
More Less

Related notes for AMS 5

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.