STAT 1100 Final: Stat Final Study Guide

14 Pages
Unlock Document

STAT 1100
Sarah Quesen

4/23/17 STAT 1100 MIDTERM QUESEN covering topics from Keller Chapters 1- 8 • Descriptive vs inferential statistics • Descriptive statistics: deals with methods of organizing, summarizing, and presenting the data in a convenient and informative way • Ex: graphical techniques, numerical techniques • Inferential statistics: a body of methods used to draw conclusions or inference about characteristics of populations based on sample data • Ex: estimation and significance test • Population parameter & sample statistic • Population: the group of all items of interest • Parameter: a descriptive measure of a population • Sample: a set drawn from the population • Statistic: a descriptive method of a sample • Statistical inference: the process of making an estimate, prediction, or decision about a population based on a sample • Variable: some characteristic of a population or sample • Ex: student grades • Interval data: Real numbers, also referred to as quantitative or numerical • Nominal data: Values of nominal data are categories, also called qualitative or categorical • Ex: Responses to questions about marital status. Single = 1, Married = 2, Divorced = 3, Widowed = 4 • Ordinal data: appear to be categorical in nature, but their values have an order, a ranking to them. Poor = 1, fair = 2, good = 3, very good = 4, excellent = 5. We can say that excellent > poor and fair < very poor • Relative frequency distribution: lists the categories and the proportion with which each occurs • relative frequencies = (# of observation in a class/ total # of observations) • Cumulative frequency distribution: is used to determine the number of observations that lie above (or below) a particular value in a data set • Bar chart: a graphical technique used for nominal data. Bars don’t touch • Pie chart: a graphical technique used for nominal data • Univariate data: one variable • Bivariate data: two variables • Cross classification table: lists the frequency of each combination of the values of the two nominal variables • Histogram: a graphical technique used for interval data, bars touch, • Bell shape: a specific type of symmetric unimodal histogram • Symmetric: a histogram is symmetric if when we draw a vertical line down the center, the two sides are identical in shape and size • Skewness: a skewed histogram is one with a long tail extending to either the right or the left • Positively skewed: tail goes to the right. Mean > Median • Negatively skewed: tail goes to the left. Mean < Median • Modality: a unimodal histogram is one with a single peak, while a bimodal histogram is one with two peaks 1 • Modal class: the class with the largest number of observations • Stem and leaf display: retains information about individual observations that would normally be lost in the creation of a histogram • Ogive: a graph of a cumulative frequency distribution • Line chart: plots the value of the variable on the vertical axis against the time periods on the horizontal axis • Cross-sectional data: observations measured at the same point in time • Time-series data: observations measured at successive points in time • Scatterplot: plots two variables against one another • Independent variable: is labeled x, and is usually placed on the horizontal axis • Dependent variable: y, is mapped on the vertical axis • Linearity and direction are two concepts we are interest in the patterns of scatter diagrams • Positive linear relationship, negative linear relationship, & weak or non-linear relationship • Good vs bad graphs: • Measures of central location: mean, median, mode • Mean: the most popular measure of central location • mean = (sum of the observations)/(number of observations) • Appropriate for describing measurement data • Affected by extreme values (outliers) • Not valid for ordinal and nominal data • Median: calculated by placing all the observations in order, the observation that falls in the middle is the median • Not sensitive to extreme values • Appropriate for ordinal data • Mode: of a set of observations is the value that occurs most frequently • Can describe nominal data • The mean is generally our first selection; there are circumstances when the median is better • The mode is seldom the best measure of central location • Measures of variability: range, standard deviation, variance, coefficient of variation • Range: simplest measure of variability • Range = largest observation – smallest observation • Variance: • Population variance: σ 2 2 • Sample variance: s • Standard deviation: square root of the variance. Used to compare the variability of several distributions and make a statement about the general shape of a distribution. If the histogram is bell shaped, we an use the empirical rule 2 • Population standard deviation: σ = √σ • Sample standard deviation: s = √s 2 • Coefficient of variation: of a set of observations is the standard deviation of the observations divided by their mean • Population = cv = σ/μ 2 • Sample = cv = s/x ̅ • IQR (Interquartile range): 5# Summary • Min, Q1, Median, Q3, Max • Interquartile range = Q -Q 3 1 • Measures the spread of the middle 50% of the observations • Measures of Relative Standing: percentiles, quartiles • Designed to provide information about the position of particular values relative to the entire data set • Empirical rule: • 68% of all observations fall within one standard deviation of the mean • 95% of all observations fall within two standard deviations of the mean • 99.7% of all observations fall within three standard deviations of the mean • Quartiles: • First (lower) quartile: Q 1 25th percentile • Second quartile: Q =250 percentile th • Third (upper) quartile: Q =375 percentile • Percentiles: The P percentile is the value for which P percent are less than that value and (100- P)% are greater than that value • Location of percentiles: allows us to approximate the location of any percentile • L p (n+1)P/100 • Box plots: technique that graphs five statistics st nd rd • Min & max observations, 1 , 2 , and 3 quartiles • Outliers: Extreme values • Positive vs negative linear relationships • Measures of Linear Relationship: covariance, correlation, determination, least squares method • Three numerical measures of linear relationship that provide information as to the strength & direction of a linear relationship between two variables: covariance, coefficient of correlation, & coefficient of determination • Covariance: • Population covariance: σ xy • Sample covariance: s xy • Coefficient of correlation: Answers how strong the association between x and y. the covariance divided by the standard deviations of the variables. Fixed range from -1 to +1 • Population coefficient of correlation: ρ = σ /xyσ x y • Sample coefficient of correlation: r = s /xysx y • +1: Strong positive linear relationship • 0: No linear relationship • -1: Strong negative linear relationship • Least squares method/ line of best fit: objective of scatter diagram is to measure the strength and direction of the linear relationship • ŷ = b o b x1 • b = y-intercept o • b 1 slope • Coefficient of determination: Measures the amount of variation in the dependent variable (y), that is explained by the variation 2n the independent variable (x). Calculated by squaring the coefficient of correlation (r) = R • Sign of r is the same sign as slope. Tell us how strongly they are related 3 • Observational study: observe differences in the explanatory variable and notice any related differences in the response variable • Experimental study: create differences in the explanatory variable and examine any resulting changes in the response variable • Why not always use an experiment? • 1. Sometimes unethical or impossible to assign people to receive a specific treatment • 2. Certain explanatory variables, such as handedness or gender, are inherent traits and cannot be randomly assigned • Basic concepts • Randomization: to balance out extraneous variables across treatments • Placebo: to control for the power of suggestion • Control group: to understand changes not related to the treatments • Survey: solicits information from people, e.g. Gallup pools, pre-election pools, marketing surveys • 1. Personal interview • 2. Telephone interview • 3. Self-administered questionnaire • Key design principles • 1. Keep the questionnaire as short as possible • 2. Ask short, simple, and clearly worded questions • 3. Start with demographic questions to help respondents get started comfortably • 4. Use dichotomous (yes/no) and multiple choice questions • 5. Use open-ended questions cautiously • 6. Avoid using leading-questions • 7. Pretest a questionnaire on a small number of people • 8. Think about the way you intend to use the collected data when preparing the questionnaire • Response rate: the proportion n of all people selected who complete the survey is a key survey parameter • Target population and Sampled population should be similar to one another • Sampling • Done for reasons of cost and practicality • Sampling plans: a method or procedure for specifying how a sample will be taken from a population. Three methods: simple random sampling, stratified random sampling, cluster sampling • Self selected sample: • Simple random sample: a sample selected in such a way that every possible sample of the same size is equally likely to be chosen • Ex: names out of a hat • Stratified random sample: obtained by separating the population into mutually exclusive sets, or strata, and then drawing simple random samples from each stratum • Ex: pool of men and women • We can acquire about the total population, make inferences within a stratum or make comparisons across strata • Cluster sample: a simple random sample of groups or clusters of elements (vs. a simple random sample of individual objects) • Ex: grid on map, talk to everyone in area 4 • This method is useful when it is difficult or costly to develop a complete list of the population members or when the population elements are widely dispersed geographically • Sample size: the larger the sample size, the more accurate we can expect the sample estimation to be • Sampling error vs nonsampling error • Sampling error: differences between the sample and the population that exist only because of the observations that happened to be selected for the sample • Increasing the sample size will reduce this error • Nonsampling error: are more serious and are due to mistakes made in the acquisition of data or due to the sample observations being selected improperly • 1. Errors in data acquisition • Recording of incorrect responses • 2. Nonresponse errors • Error (or bias) introduced when responses are not obtained from some members of the sample • Response rate: the proportion of all people selected who complete the survey • 3. Selection bias • When the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample • Increasing the sample size will not reduce this type of error • Mutually exclusive events: when two events are mutually exclusive, two events cannot occur together, their joint probability is 0 • P(AUB) = P(A) + P(B) • Relative frequency approach to probability: assigning probabilities based on experimentation or historical data • Joint probability: • Intersection: of events and B is the set of all ample points that are in both A and B • Intersection is denoted: A and B • The joint probability of A and B is the probability of the intersection of A and B, i.e. P(AandB) • Union: of two events A and B, event containing all sample points that are in A or B or both • Union of A and B is denoted: A or B • Marginal probability: we can calculate the marginal probabilities by summing across rows and down columns to determine the probabilities of x and y individually • Conditional probability: used to determine how two events are replaced, we determine the probability of one event given the occurrence of another related event • P(A|B) and read as “the probability of A given B” • P(A|B) = P(AandB)/P(B) • The probability of an event given that another event has occurred • Independent events: the probability of one event is not affected by the occurrence of the other event • P(A|B) = P(A) or (B|A) = P(B) • • Complement of Event A complement of event A, the event consisting of all sample points that c are “not in A”: A • P(A) + P(A ) = 1 • Complement Rule: gives the probability of an event NOT occurring 5 • P(A ) = 1-P(A) • Multiplication rule: Used to calculate the joint probability of two events • P(A|B) = P(AandB)/P(B) • If A and B are independent (outcome of A doesn’t change outcome of B): P(AandB) = P(A) * P(B) • Addition rule: used to compute the probability of event A or B or both A and B occurring, i.e. the union of A and B: P(AandB) = P(A) + P(B) – P(AandB) • Conditional probability: • Random variable: a function or rule that assigns a number to each outcome of an experiment • Can be discrete or continuous • Discrete random variable: one that takes on a countable number of values • Ex: values one the roll of dice: 2,3,4…12; integers • Continuous random variables: one whose values are not discrete, not countable • Ex: time; real numbers • Because there is an infinite number of values, the probability of each individual value is virtually 0. We can determine the probability of a range of values only • Probability distributions: a table, formula, or graph that describes the values of a random variable and the probability associated with these values • There are two types of probability distributions: • 1. Discrete probability distributions • 2. Continuous probability distributions • Probability Notation • X: an upper-case letter will represent the name of the random variable • x: its lower-case counterpart will represent the value of the random variable • P(X=x) or P(x): the probability that the random variable X will equal x • Discrete probability distributions: the probabilities of the values of a discrete random variable may be derived tools such as tree diagrams or by applying one of the identifications of probability, so long as two conditions apply: • 1. O ≤ P(x) ≤ 1 for all x • 2. ∑P(x) = 1 • Represents a population • Population mean: the weighted average of all its values. The weights are the probabilities • Expected value of E(X) = μ = ∑xP(x) • Population variance: it is the weighted average of the squared deviations from the mean 2 2 • V(X) = σ = ∑(x-μ) P(x) • Discrete bivariate distribution • 1. 0 ≤ P(x,y) ≤ 1 • 2. ∑ ∑ P(x,y) = 1 • Binomial distribution: the probability distribution that results from doing a “binomial experiment” • 1. Fixed number of trials, represented as n • 2. each trial has two possible outcomes, a “success” and “failure” • 3. P(success)=p and P(failure)=1-p for all trials • 4. The trails are independent, which means that the outcome of one trial does not affect the outcomes of any other trials • Binomial random variable: the random variable of a binomial experiment is defined as the number of successes in the n trials 6 • Poisson distribution: a discrete probability distribution and refers to the number of events (successes) within a specific time period or region of space • 1. The number of successes that occur in any interval is independent of the number of success that occur in any other interval • 2. The probability of a success in an interval is the same for all equal-size intervals • 3. The probability of a success is proportional to the size of the interval • 4. The probability of more than one success is an interval approaches 0 as the interval becomes smaller • Poisson random variable: the number of successes that occur in a period of time or an interval of space in a Poisson experiment • Probability density function: a function f(x) is called this over the range of a≤x≤b if it meets the following requirements • 1. f(x)≥0 for all x between and b • 2. The total area under the curve between a and b is 1.0 • Uniform probability distribution: rectangular probability distribution • f(x) = 1/b-a, where a≤x
More Less

Related notes for STAT 1100

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.