false

Study Guides
(248,409)

United States
(123,376)

University of Pittsburgh
(383)

Statistics
(22)

STAT 1100
(15)

Sarah Quesen
(2)

Final

Unlock Document

Statistics

STAT 1100

Sarah Quesen

Spring

Description

4/23/17
STAT 1100 MIDTERM QUESEN covering topics from Keller Chapters 1- 8
• Descriptive vs inferential statistics
• Descriptive statistics: deals with methods of organizing, summarizing, and presenting the
data in a convenient and informative way
• Ex: graphical techniques, numerical techniques
• Inferential statistics: a body of methods used to draw conclusions or inference about
characteristics of populations based on sample data
• Ex: estimation and significance test
• Population parameter & sample statistic
• Population: the group of all items of interest
• Parameter: a descriptive measure of a population
• Sample: a set drawn from the population
• Statistic: a descriptive method of a sample
• Statistical inference: the process of making an estimate, prediction, or decision about a
population based on a sample
• Variable: some characteristic of a population or sample
• Ex: student grades
• Interval data: Real numbers, also referred to as quantitative or numerical
• Nominal data: Values of nominal data are categories, also called qualitative or categorical
• Ex: Responses to questions about marital status. Single = 1, Married = 2, Divorced = 3,
Widowed = 4
• Ordinal data: appear to be categorical in nature, but their values have an order, a ranking to
them. Poor = 1, fair = 2, good = 3, very good = 4, excellent = 5. We can say that excellent > poor
and fair < very poor
• Relative frequency distribution: lists the categories and the proportion with which each occurs
• relative frequencies = (# of observation in a class/ total # of observations)
• Cumulative frequency distribution: is used to determine the number of observations that lie
above (or below) a particular value in a data set
• Bar chart: a graphical technique used for nominal data. Bars don’t touch
• Pie chart: a graphical technique used for nominal data
• Univariate data: one variable
• Bivariate data: two variables
• Cross classification table: lists the frequency of each combination of the values of the two
nominal variables
• Histogram: a graphical technique used for interval data, bars touch,
• Bell shape: a specific type of symmetric unimodal histogram
• Symmetric: a histogram is symmetric if when we draw a vertical line down the center, the two
sides are identical in shape and size
• Skewness: a skewed histogram is one with a long tail extending to either the right or the left
• Positively skewed: tail goes to the right. Mean > Median
• Negatively skewed: tail goes to the left. Mean < Median
• Modality: a unimodal histogram is one with a single peak, while a bimodal histogram is one with
two peaks
1 • Modal class: the class with the largest number of observations
• Stem and leaf display: retains information about individual observations that would normally be
lost in the creation of a histogram
• Ogive: a graph of a cumulative frequency distribution
• Line chart: plots the value of the variable on the vertical axis against the time periods on the
horizontal axis
• Cross-sectional data: observations measured at the same point in time
• Time-series data: observations measured at successive points in time
• Scatterplot: plots two variables against one another
• Independent variable: is labeled x, and is usually placed on the horizontal axis
• Dependent variable: y, is mapped on the vertical axis
• Linearity and direction are two concepts we are interest in the patterns of scatter diagrams
• Positive linear relationship, negative linear relationship, & weak or non-linear relationship
• Good vs bad graphs:
• Measures of central location: mean, median, mode
• Mean: the most popular measure of central location
• mean = (sum of the observations)/(number of observations)
• Appropriate for describing measurement data
• Affected by extreme values (outliers)
• Not valid for ordinal and nominal data
• Median: calculated by placing all the observations in order, the observation that falls in the
middle is the median
• Not sensitive to extreme values
• Appropriate for ordinal data
• Mode: of a set of observations is the value that occurs most frequently
• Can describe nominal data
• The mean is generally our first selection; there are circumstances when the median is better
• The mode is seldom the best measure of central location
• Measures of variability: range, standard deviation, variance, coefficient of variation
• Range: simplest measure of variability
• Range = largest observation – smallest observation
• Variance:
• Population variance: σ 2
2
• Sample variance: s
• Standard deviation: square root of the variance. Used to compare the variability of several
distributions and make a statement about the general shape of a distribution. If the histogram is
bell shaped, we an use the empirical rule 2
• Population standard deviation: σ = √σ
• Sample standard deviation: s = √s 2
• Coefficient of variation: of a set of observations is the standard deviation of the observations
divided by their mean
• Population = cv = σ/μ
2 • Sample = cv = s/x ̅
• IQR (Interquartile range): 5# Summary
• Min, Q1, Median, Q3, Max
• Interquartile range = Q -Q 3 1
• Measures the spread of the middle 50% of the observations
• Measures of Relative Standing: percentiles, quartiles
• Designed to provide information about the position of particular values relative to the
entire data set
• Empirical rule:
• 68% of all observations fall within one standard deviation of the mean
• 95% of all observations fall within two standard deviations of the mean
• 99.7% of all observations fall within three standard deviations of the mean
• Quartiles:
• First (lower) quartile: Q 1 25th percentile
• Second quartile: Q =250 percentile
th
• Third (upper) quartile: Q =375 percentile
• Percentiles: The P percentile is the value for which P percent are less than that value and (100-
P)% are greater than that value
• Location of percentiles: allows us to approximate the location of any percentile
• L p (n+1)P/100
• Box plots: technique that graphs five statistics
st nd rd
• Min & max observations, 1 , 2 , and 3 quartiles
• Outliers: Extreme values
• Positive vs negative linear relationships
• Measures of Linear Relationship: covariance, correlation, determination, least squares method
• Three numerical measures of linear relationship that provide information as to the strength &
direction of a linear relationship between two variables: covariance, coefficient of correlation,
& coefficient of determination
• Covariance:
• Population covariance: σ xy
• Sample covariance: s xy
• Coefficient of correlation: Answers how strong the association between x and y. the covariance
divided by the standard deviations of the variables. Fixed range from -1 to +1
• Population coefficient of correlation: ρ = σ /xyσ x y
• Sample coefficient of correlation: r = s /xysx y
• +1: Strong positive linear relationship
• 0: No linear relationship
• -1: Strong negative linear relationship
• Least squares method/ line of best fit: objective of scatter diagram is to measure the strength
and direction of the linear relationship
• ŷ = b o b x1
• b = y-intercept
o
• b 1 slope
• Coefficient of determination: Measures the amount of variation in the dependent variable (y),
that is explained by the variation 2n the independent variable (x). Calculated by squaring the
coefficient of correlation (r) = R
• Sign of r is the same sign as slope. Tell us how strongly they are related
3 • Observational study: observe differences in the explanatory variable and notice any related
differences in the response variable
• Experimental study: create differences in the explanatory variable and examine any resulting
changes in the response variable
• Why not always use an experiment?
• 1. Sometimes unethical or impossible to assign people to receive a specific treatment
• 2. Certain explanatory variables, such as handedness or gender, are inherent traits and
cannot be randomly assigned
• Basic concepts
• Randomization: to balance out extraneous variables across treatments
• Placebo: to control for the power of suggestion
• Control group: to understand changes not related to the treatments
• Survey: solicits information from people, e.g. Gallup pools, pre-election pools, marketing
surveys
• 1. Personal interview
• 2. Telephone interview
• 3. Self-administered questionnaire
• Key design principles
• 1. Keep the questionnaire as short as possible
• 2. Ask short, simple, and clearly worded questions
• 3. Start with demographic questions to help respondents get started comfortably
• 4. Use dichotomous (yes/no) and multiple choice questions
• 5. Use open-ended questions cautiously
• 6. Avoid using leading-questions
• 7. Pretest a questionnaire on a small number of people
• 8. Think about the way you intend to use the collected data when preparing the
questionnaire
• Response rate: the proportion n of all people selected who complete the survey is a key survey
parameter
• Target population and Sampled population should be similar to one another
• Sampling
• Done for reasons of cost and practicality
• Sampling plans: a method or procedure for specifying how a sample will be taken from a
population. Three methods: simple random sampling, stratified random sampling, cluster
sampling
• Self selected sample:
• Simple random sample: a sample selected in such a way that every possible sample of the same
size is equally likely to be chosen
• Ex: names out of a hat
• Stratified random sample: obtained by separating the population into mutually exclusive sets, or
strata, and then drawing simple random samples from each stratum
• Ex: pool of men and women
• We can acquire about the total population, make inferences within a stratum or make
comparisons across strata
• Cluster sample: a simple random sample of groups or clusters of elements (vs. a simple random
sample of individual objects)
• Ex: grid on map, talk to everyone in area
4 • This method is useful when it is difficult or costly to develop a complete list of the
population members or when the population elements are widely dispersed
geographically
• Sample size: the larger the sample size, the more accurate we can expect the sample estimation to
be
• Sampling error vs nonsampling error
• Sampling error: differences between the sample and the population that exist only because
of the observations that happened to be selected for the sample
• Increasing the sample size will reduce this error
• Nonsampling error: are more serious and are due to mistakes made in the acquisition of data
or due to the sample observations being selected improperly
• 1. Errors in data acquisition
• Recording of incorrect responses
• 2. Nonresponse errors
• Error (or bias) introduced when responses are not obtained from some members of
the sample
• Response rate: the proportion of all people selected who complete the survey
• 3. Selection bias
• When the sampling plan is such that some members of the target population cannot
possibly be selected for inclusion in the sample
• Increasing the sample size will not reduce this type of error
• Mutually exclusive events: when two events are mutually exclusive, two events cannot occur
together, their joint probability is 0
• P(AUB) = P(A) + P(B)
• Relative frequency approach to probability: assigning probabilities based on experimentation
or historical data
• Joint probability:
• Intersection: of events and B is the set of all ample points that are in both A and B
• Intersection is denoted: A and B
• The joint probability of A and B is the probability of the intersection of A and B, i.e.
P(AandB)
• Union: of two events A and B, event containing all sample points that are in A or B or both
• Union of A and B is denoted: A or B
• Marginal probability: we can calculate the marginal probabilities by summing across rows and
down columns to determine the probabilities of x and y individually
• Conditional probability: used to determine how two events are replaced, we determine the
probability of one event given the occurrence of another related event
• P(A|B) and read as “the probability of A given B”
• P(A|B) = P(AandB)/P(B)
• The probability of an event given that another event has occurred
• Independent events: the probability of one event is not affected by the occurrence of the other
event
• P(A|B) = P(A) or (B|A) = P(B)
• • Complement of Event A complement of event A, the event consisting of all sample points that
c
are “not in A”: A
• P(A) + P(A ) = 1
• Complement Rule: gives the probability of an event NOT occurring
5 • P(A ) = 1-P(A)
• Multiplication rule: Used to calculate the joint probability of two events
• P(A|B) = P(AandB)/P(B)
• If A and B are independent (outcome of A doesn’t change outcome of B): P(AandB) =
P(A) * P(B)
• Addition rule: used to compute the probability of event A or B or both A and B occurring, i.e.
the union of A and B: P(AandB) = P(A) + P(B) – P(AandB)
• Conditional probability:
• Random variable: a function or rule that assigns a number to each outcome of an experiment
• Can be discrete or continuous
• Discrete random variable: one that takes on a countable number of values
• Ex: values one the roll of dice: 2,3,4…12; integers
• Continuous random variables: one whose values are not discrete, not countable
• Ex: time; real numbers
• Because there is an infinite number of values, the probability of each individual value is
virtually 0. We can determine the probability of a range of values only
• Probability distributions: a table, formula, or graph that describes the values of a random
variable and the probability associated with these values
• There are two types of probability distributions:
• 1. Discrete probability distributions
• 2. Continuous probability distributions
• Probability Notation
• X: an upper-case letter will represent the name of the random variable
• x: its lower-case counterpart will represent the value of the random variable
• P(X=x) or P(x): the probability that the random variable X will equal x
• Discrete probability distributions: the probabilities of the values of a discrete random variable
may be derived tools such as tree diagrams or by applying one of the identifications of
probability, so long as two conditions apply:
• 1. O ≤ P(x) ≤ 1 for all x
• 2. ∑P(x) = 1
• Represents a population
• Population mean: the weighted average of all its values. The weights are the
probabilities
• Expected value of E(X) = μ = ∑xP(x)
• Population variance: it is the weighted average of the squared deviations from the mean
2 2
• V(X) = σ = ∑(x-μ) P(x)
• Discrete bivariate distribution
• 1. 0 ≤ P(x,y) ≤ 1
• 2. ∑ ∑ P(x,y) = 1
• Binomial distribution: the probability distribution that results from doing a “binomial
experiment”
• 1. Fixed number of trials, represented as n
• 2. each trial has two possible outcomes, a “success” and “failure”
• 3. P(success)=p and P(failure)=1-p for all trials
• 4. The trails are independent, which means that the outcome of one trial does not affect the
outcomes of any other trials
• Binomial random variable: the random variable of a binomial experiment is defined as the
number of successes in the n trials
6 • Poisson distribution: a discrete probability distribution and refers to the number of events
(successes) within a specific time period or region of space
• 1. The number of successes that occur in any interval is independent of the number of success
that occur in any other interval
• 2. The probability of a success in an interval is the same for all equal-size intervals
• 3. The probability of a success is proportional to the size of the interval
• 4. The probability of more than one success is an interval approaches 0 as the interval
becomes smaller
• Poisson random variable: the number of successes that occur in a period of time or an interval
of space in a Poisson experiment
• Probability density function: a function f(x) is called this over the range of a≤x≤b if it meets the
following requirements
• 1. f(x)≥0 for all x between and b
• 2. The total area under the curve between a and b is 1.0
• Uniform probability distribution: rectangular probability distribution
• f(x) = 1/b-a, where a≤x

More
Less
Related notes for STAT 1100

Join OneClass

Access over 10 million pages of study

documents for 1.3 million courses.

Sign up

Join to view

Continue

Continue
OR

By registering, I agree to the
Terms
and
Privacy Policies

Already have an account?
Log in

Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.