Statistics: The science of learning from data. Exploratory Data Analysis: Uses graphs and numerical summaries to describe the variables in a data set and the relations among them. Data: Data are numerical facts. Data are numbers with a context, and we need to understand the context if we are to 1. Begin by examining each variable by itself. Then move on to study the relationship among the variables. make sense of the numbers. Individuals: are the objects described in a set of data. 2. Begin with graphs. Then add numerical summaries of specific aspects of the data. Distribution of a Categorical Variable: lists the categories and gives either the count or Individuals are sometimes people. the percent of individuals who fall in each category. When the objects that we want to study are not people, we often call them cases. Bar Graph (describe the distribution of categorical variables): The heights of the bars compare Cases: When the objects that we want to study are not people, we often call them the percents of each category. cases. -easier to read and more flexible Variable: is any characteristic of an individual. Pie Chart (describe the distribution of categorical variables): Helps to see what part of the A variable can take different values for different individuals. group each category forms. Categorical Variable: place an individual into one of two or more groups or -Because pie charts lack scales, add the percents to the labels for each slice. categories. -Require that you include all the categories that make up a whole. Quantitative Variable: takes numerical values for which arithmetic operations, such -Use them only when you want to emphasize each category's relation to the whole. as adding and averaging, make sense. Tails: The extreme values of a distribution are in the tails of the distribution. Distribution: The distribution of a variable tells us what values it takes and how often Stemplots (describe the distribution of quantitative variables): Gives a quick picture of the it takes these values. shape of a distribution while including the actual numerical values in the graph. Rate: Often, the rate at which something occurs is more meaningful than the simple -also called a stem-and-leaf plot -work best for small numbers of observations that are all greater than 0. count of occurrences. -do not work well for large data sets. Back-to-back stemplot: when comparing two related distributions using a common stem and the Mean: The average value. leaves for one on the right and the leaves for the other on the left. -a measure of the center of a distribution. Histograms (describe the distribution of quantitative variables): Break the range of values of a -To find the mean of a set of observations, add their values and divide by the number of variable into classes and display only the count or percent of the observation that fall into each observations. Median: The middle value or midpoint. class. -a measure of the center of a distribution. -A histogram shows the distribution of counts or percents among the values of a single variable. A -half of the observations are smaller than the median and half are larger. bar graph compares the size of different items. -first arrange all the values from smallest to largest -plot the frequencies (counts) or the percents of equal-width classes of values. -(n+1)/2 = position of median Frequencies: The number of individuals in each class. -Also known as the 50th percentile - the base of the bar covers the class and the height is the class count. -(see notation) Resistant Measure: A measure that is not sensitive to the influence of a few extreme - the area of a histogram is determined by the height since the width of all the bars are equal. observations (e.g. outliers. Outlier: are observations that lie outside the overall pattern of a distribution. Spread: The simplest useful numerical description of a distribution consists of both a measure of -An individual value that falls outside the overall pattern. center and a measure of spread. Midpoint of a Distribution: the value with roughly half the observations taking smaller values and First Quartile - Q1: the median of the observations whose position in the ordered list is to the half taking larger values. left of the location of the overall median. Modes: major peaks in the graph of a distribution -Remember to first arrange the observations in increasing order. -a distribution with one major peak is called unimodal -(n)(0.25) = position of Q1 Symmetric Distribution: when the values smaller and larger than its midpoint are mirror images of Third Quartile - Q3: the median of the observations whose position in the ordered list is to the left of the location of the overall median. each other. -Remember to first arrange the observations in increasing order. Skewed to the right: if the right tail (larger values) of the distribution is much longer than the left tail -(n)(0.75) = position of Q2 (smaller values) Five-Number Summary: consists of the smallest observation, the first quartile, the median, the Time Plots: the plotting of each observation against the time at which it was measured. Always put third quartile, and the largest observation, written in order from smallest to largest. time on the horizontal scale of your plot and the variable you are measuring on the vertical scale. -In symbols: Minimum Q1 M Q3 Maximum Connecting the data points by lines helps emphasize any changes over time. -leads to another visual representation of a distribution of a
