Type of Biases:
1) Selection bias=When a grouped cannot be part of a sample. Central Tendency:
2) Non response bias= Unable to obtain data on all experimental units. 1) Mean: sum of all measurements divided by # of trials. Denoted ̅
3) Measurement bias=inaccuracy in values of data or leading questions ∑ . and denoted for the population.
2) Median: Middle #
Relative Standing: 3) Mode: Measurement that occurs most frequently.
1) Percentile Score: Percentage of values which fall below x. Variability:
2) Z-score: how far from mean (how likely to contribute) 1) Range (max – min)
2) Standard Deviation (s or for pop.) = √
∑( ̅ ∑
Describe distribution: central distribution (median&mode), spread (IQR), 3) Variance for the pop
skewness, variance &outliers. 4) Interquartile Range (IQR=Q -QU) L
Rules for determining Quantities outliers:
Interval Chebyshv’s Rule Empirical Rule (Mound Rules for determining Quantative outliers:
Measuremens falling between inner/outer fences are suspect
̅ Z= 1 At least 0% About 68% outliers.
̅ Z= 2 At least 75% About 95% Measruements falling outside inner/outer fences are highly suspect
̅ Z= 3 At least 89% About 99.7%
̅ ( ) Z-Score: Less then -2 or more then 2 is a possible outlier. ±3 are
Cause of Outlier:1) Incorrect measurement 2) Wrong Population 3) Rare event.
Union (∪) A or B P(A∪B)= P(A)+P(B)-P(A∩B) Addition Rule
A or B P(A∩B)= P(A)+P(B)-P(A∪B)
Intersection (∩) A & B Independent P(A∩B)= P(A) P(B) Multiplicative Rule
Dependant P(A∩B)= P(A|B) P(B)=P(B|A) P(A) Multiplicative Rule
Complementary ( ) c Not A P(A )=1-P(A) Complement Rule
Conditional (A|B) A given B P(A|B)=P(A∩B)/P(B) Conditional Rule We determine skwness based on
the lengh of the tail. If the left
P(B|A)=P(A∩B)/P(A) Conditional Rule tail is longer, the plot is left
Mutualy Exclusive exclusive Aka. Disjoint
P(A∩B)=0 skewed and asymetric.
De Morgan’s Law P(A∪B) = P(A ∩B ) C De Morgan’s Law
C C C Left skewed= mean median
All posibilites P(A)=P(A|B)P(B)+P(A|B )P(B ) All possibilites
P(B|A )+P(B |A )=1
Conditional Independance P(B 1B |2)= P(B |A)1P(B |A) 2
Bayes Rule ( ) ( | )
( | )
( ) ( | ) ( | ) ( )
Independence: Two events are independent if: P(B)=P(B|A) and P(A)=P(A|B)
Combinations Rule: N choose n. ( ) ( ) Ex: ( ) ( ) = ( )= ( ) =
TYPE OF PLOTS:
A histogram shows the distribution of a continuous variable. The X axis is just one measure and is partitioned into blocks which show the number of observations in
A bar chart is used for qualitative data. Each bar on the barchart represents a separate category.
A Pareto Chart is a special version of a barplot where the bars are in decreasing order of frequency and there is a cumulative probability line
Box plot: LIF= Q -(L.5*IQR), UIF= Q +(1.5*UQR)
Normal binomial to normal
The % of body fat in men is an ~ Normal random variable with mean equal
to 15% and standard deviation equal to 3%.
a) a measure of 20% or more body fat characterizes the person as obese, what is the ~ probability that
a rndm smpl of 1000 sailors will contain at most 50 who would be characterized as obese?
b) If the Navy actually were to check the percentage of body fat for a random sample of 1000 of its
sailors, and if only 30 of them had 20% (or higher) body fat, could it be concluded that the Navy was
successful in reducing the percentage of obese sailors below the percentage of the general population?
Base your conclusion on an appropriate test conducted at the 5% signicance level.
a) ( ) ( ) ( ) ( )
( ) ( ) ( )( ) ( )
( √ ( )) ( √ ( )( )( ))
b) ̂ ̂ ( )
√ ( ) √ ( ) Central Limit Theorum:
Probability Distribution: Random smpl of 𝑛 If not, we assume that the pop. is dist is normal.
1) binomial: discrete data (n,p) ( ) ( ) ( ) , mean:µ=np, If n small, will only hold if the distr of the data is normally distr.