Description
Chapter 1 Def’n: Statistics: 1) are commonly known as numerical facts 2) is a field of discipline or study Htets,tics is about variation. 3 main aspects of statistics: 1) Design (“Think”): Planning how to obtain data to answer questions. 2) Description (“Show”): Summarizing the obtained data. 3) Inference (“Tell”): Making decisions and predictions based on data. Chapter 2 - Data Def’n: A population consists of all elements whose characteristics are being studied. Esto.e)nts samAple is a portion of the population selected for study. Ex2.2) UofA students in this section A parameter is a summary measure calculated for population data. A statistic is a summary measure calculated for sample data. Types of statistics: Descriptive: methods to view a given dataset. Æ averages, histograms Inferential: methods using sample results to infer conclusions about a larger population. Æ t-tests, simple linear regression Def’n: A variable is any characteristic that is recorded for subjects in a study. - Qualitative (categorical): cannot assume a numerical value but classifiable into 2 or more non-numeric categories. Æ gender, smell, grades - Quantitative (numerical): measured numerically. - Discrete: only certain values with no intermediate values. Æ integers, grades - Continuous: any numerical value over a certain interval or intervals. Æ GPA, gas prices Chapter 3 – Categorical Data Graphs Def’n: A frequency table (for qualitative data) is a listing of possible values for a variable, together with the # of observations for each value. ( Major Frequency f) Relative frequency Percentage (%) Science Arts Business Nursing Other f Relative frequency = f Percentage = Relative frequency × 100% ∑ Graphical Summaries Def’n: A bar chart is a graph of bars whose heights represent the (relative) frequencies of respective categories. Ex3.1) (preceding table used in class) Look for: frequently and infrequently occurring categories. A pie chart is a circle divided into portions that represent (relative) frequency belonging to different categories. Ex3.2) (preceding table used in class) Look for: categories that form large and small proportions of the data set. A segmented bar chart uses a rectangular bar divided into segments that represent frequency or relative freq. of different categories. Ex3.3) (preceding table used in class) Chapter 4 – Numerical Variable Graphs Def’n: A stem-and-leaf display has each value divided into two portions: a stem and a leaf. The leaves for each stem are shown separately. (Values should be ranked.) Look for: - typical values and corresponding spread - gaps in the data or outliers - presence of symmetry in the distribution - number and location of peaks Ex4.1) U.S. Box Office for weekend of Dec. 27 – 29, 2013 29.0 28.6 19.7 18.7 18.4 13.5 12.8 10.1 9.9 7.3 0 | 7.3 9.9 1 | 0.1 2.8 3.5 8.4 8.7 9.7 2 | 8.6 9.0 Note: Dotplots also exist (see p. 52 in textbook), but “replace” the values with dots. Def’n: A histogram , like a bar graph, graphically shows a frequency distribution. The data here, however, is quantitative. Look for: - central or typical value and corresponding spread - gaps in the data or outliers - presence of symmetry in the distribution - number and location of peaks The data divide into intervals (normally of equal width). Cumulative Relative Frequency = (Cumul. freq. of a class) / (Total obs’ns in dataset) Table 4X0 – Total earnings as of Jan. 7/2014 (551 films) Worldwide Box Office Number of movies Relative Cumulative (in millions) f Frequency rel. freq. 200 to 599 466 0.8457 0.8457 600 to 999 68 0.1234 0.9691 1000 to 1399 14 0.0254 0.9946 1400 to 1799 1 0.0018 0.9964 1800 to 2199 1 0.0018 0.9982 2200 to 2599 0 0.0000 0.9982 2600 to 3000 1 0.0018 1.0000 Ex4.2) (drawn in class using above data) NOTE: Dot and S-and-L plots are good for small data sets because data values are retained. Histograms are better for large data sets to condense the data. Histogram shapes/traits: (corresponding figures drawn in class) 1. Modes (unimodal, bimodal, multimodal, uniform) 2. Skewness (symmetric, left-skewed & right-skewed) Æ term refers to “TAIL” 3. Tail weight (normal, heavy-tailed, light-tailed) Def’n: A timeplot is a graph of data collected over time (or a time series). Look for: - a trend over time, denoting a decrease or increase. - a pattern repeating at regular intervals (a cycle or seasonal variation) Ex4.3) (drawn in class) Chapters 4/5 – Summary measures (and one more graph) Measures of Center Def’n: An outlier is an obs’n that falls well above or below the overall bulk of the data. ∑ y i y1+ y2+...+ yn ∑ yi Population mean: µ = N Samplemean: y = n = n The medianis the value of the midpoint of a data set that has been ranked in order, increasing or decreasing. If dataset has an even # of observations, use the average of the middle 2 values. Note: median resistant to outliers, mean uses all observations. Table 5X0 – Estimated provincial populations circa Jul. 2011 (in millions) ON PQBC ABMBSKNSNB NLPEI 13.373 7.9804.5733.779 1.250 1.058 0.945 0.756 0.511 0.146 Ex5.1) Avg. pop’n of all provinces: Avg. pop’n from sample of 3 provinces: y y µ = ∑ i =13.373+...+ 0.146= 3.437 y= ∑ i = 4.573+3.779+1.250 = 3.201 N 10 n 3 Outlier effect? (remove Ontario & Quebec) 1.250+1.058 y 4.573+...+0.146 median = =1.154 y= ∑ i = =1.627 2 n 8 Comparing Mean and Median: (corresponding figures drawn in class) 1. Symmetric curve & histogram - the 2 are identical, lie at center of distribution 2.
