Class Notes (806,431)
Statistics (237)
STAT151 (146)
Lecture

# Ch2.pdf

6 Pages
98 Views

School
University of Alberta
Department
Statistics
Course
STAT151
Professor
Paul Cartledge
Semester
Fall

Description
2.1 Types of Data Def’n: A variable is any characteristic that is recorded for subjects in a study. - Qualitative (categorical): cannot assume a numerical value but classifiable into 2 or more non-numeric categories. e.g. gender, smell, grades - Quantitative (numerical): measured numerically. - Discrete: only certain values with no intermediate values. e.g. integers, grades - Continuous: any numerical value over a certain interval or intervals. e.g. GPA, gas prices Def’n: A frequency table (for qualitative data) is a listing of possible values for a variable, together with the # of observations for each value. ( Major Frequency f) Relative frequency Percentage (%) Science Arts Business Nursing Other Relative_ frequency = frequency pctg. = (Relative frequency) x 100 ∑ f 2.2 Graphical Summaries Def’n: A pie chart is a circle divided into portions that represent relative frequency belonging to different categories. e.g. (above table used in class) Look for: categories that form large and small proportions of the data set. A bar graph displays vertical bars whose heights represent the frequencies of respective categories. e.g. (above table used in class) Look for: frequently and infrequently occurring categories. Graphs for quantitative variables: Def’n: A stem-and-leaf plot has each value divided into two portions: a stem and a leaf. The leaves for each stem are shown separately in a display. (Values should be ranked.) Look for: - typical values and corresponding spread - gaps in the data or outliers - presence of symmetry in the distribution - number and location of peaks Ex2.1) U.S. Box Office for weekend of January 2, 2011 25.8 24.4 18.8 12.4 10.3 10.0 9.8 9.3 8.9 7.8 0 | 7.8 8.9 9.3 9.8 1 | 0.0 0.3 2.4 8.8 2 | 4.4 5.8 A comparative S-and-L plot has a common stem to compare two related distributions: 9.8 8.4 7.3 7.0 6.8 5.2 4.8 | 0 | 7.8 8.9 9.3 9.8 5.0 3.8 0.7 | 1 | 0.0 0.3 2.4 8.8 | 2 | 4.4 5.8 Note: Dot plots also exist (see p. 33 in textbook), but “replace” the values with dots. Def’n: A histogram , like a bar graph, graphically shows a frequency distribution. The data here, however, is quantitative. Look for: - central or typical value and corresponding spread - gaps in the data or outliers - presence of symmetry in the distribution - number and location of peaks The data divide into intervals (normally of equal width). Cumulative Relative Frequency = (Cumul. freq. of a interval) / (Total obs’ns in dataset) Total earnings as of Jan. 9/2011 (412 films) Worldwide Box Office Number of movies Relative Cumulative (in millions) f Frequency rel. freq. 200 to 599 365 0.867 0.867 600 to 999 49 0.116 0.983 1000 to 1399 5 0.012 0.995 1400 to 1799 0 0 0.995 1800 to 2199 1 0.002 0.997 2200 to 2599 0 0 0.997 2600 to 3000 1 0.002 0.999 Ex2.2) (drawn in class using above data) NOTE: Dot and S-and-L plots are good for small data sets because data values are retained. Histograms are better for large data sets to condense the data. Histogram traits: (corresponding figures drawn in class) 1. Modes (unimodal, bimodal, multimodal, uniform) 2. Skewness (symmetric, left-skewed & right-skewed) Æ term refers to “TAIL” 3. Tail weight (normal, heavy-tailed, light-tailed) Def’n: A time plot is a graph of data collected over time (or a time series). Look for: - a trend over time, denoting a decrease or increase. - a pattern repeating at regular intervals (a cycle or seasonal variation) Ex2.3) (drawn in class) 2.3 Measures of Center Def’n: An outlier is an obs’n that falls well above or below the overall bulk of the data. x x Population mean: µ =∑ iSamplemean: x = x1+ x2 +...+xn = ∑ i N n n The median is the value of the midpoint of a data set that has been ranked in order, increasing or decreasing. If dataset has an even # of observations, use the average of the middle 2 values. Note: median resistant to outliers, mean uses all observations. Table 2X0 – Estimated provincial populations circa Oct. 2007 (in millions) ON PQBC ABMBSKNSNB NFLPEI 12.851 7.7204.4033.487 1.190 1.003 0.935 0.751 0.507 0.139 Ex2.4) Avg. pop’n of all provinces: Avg. pop’n from sample of 3 provinces: ∑ xi 12.851+...+ 0.139 ∑ xi 4.403+3.487 +1.190 µ = = = 3.299 x = = = 3.027 N 10 n 3 1.190+1.003 median = 2 =1.097 Outlier effect? (remove Ontario & Quebec) ∑ xi 4.403+...+0.139 x = = =1.552 n 8 Comparing Mean and Median: (corresponding figures drawn in class) 1. Symmetric curve & histogram - the 2 are identical, lie at center of distribution 2. Right-skewed: Median < Mean 3. Left-skewed: Mean < Median Def’n: The mode is the most frequent value in a data set. Exro5.) Æ no mode Movies Æ 200 – 499 2.4 Measures of Spread Def’n: Range = largest value – smallest value = max – min Ex2.6) (from Table 2X0) range = 12.851 – 0.139 = 12.712 Deviations from the Mean: Ex2.7) 1, 2, 4, 3 xi
More Less

Related notes for STAT151

OR

Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Join to view

OR

By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.