Textbook Notes
(363,566)

Canada
(158,433)

University of Waterloo
(5,223)

Statistics
(27)

STAT 230
(3)

Mu Zhu
(1)

Chapter

# Probability Notes.pdf

Unlock Document

University of Waterloo

Statistics

STAT 230

Mu Zhu

Winter

Description

Probability
Definition: Probability
P(A) – The probability of event A occur.
Sample space/Events
Definition: The set of all possible outcomes.
Example:
Tossing two dice, sample space, ( ) ( ) ( ) ( ) ( ) .
Let A = {obtaining a total of 10}, B = {obtaining a total greater than 10}.
( ) ( )
| |
Remark: If elementary outcomes equally likely, then ( ) | |
Counting:
Multiplicative Principle:
Suppose p ways to do job 1, q ways to do job 2, then pxq ways to do both 1 and 2.
Additive Principle:
Suppose p ways to do job 1, q ways to do job 2, then p+q ways to do 1 or 2.
Example:
The letters in the word ‘statistics’ are arranged in random order;
a. What is the probability of spelling ‘statistics’?
P(spelling the word ‘statistics’) = .
b. What is the probability of getting the same letters at both ends?
P(getting same letters at both ends) =
Note: Numbers of arrangement of the word ‘statistics’ = . Ways of choosing
There is a total of n objects. How many different ways to choose objects?
1. Without replacement, order matters.
( ) ( )
( )
2. With replacement, order matters.
3. Without replacement, order doesn’t matter.
Since it can be randomly shuffled, therefore: ( ) ( )
4. With replacement, order doesn’t matter.
( )
Example: Birthday Problem
a. Given there are n people. Let A be the event of none of them share the same birthday.
( ) ( )
b. How large does n need to be in order for the probability o be less than 50%?
Answer: .
Example: Poker
( )( )( )( )
( )
( )
( )
( ) ( )
Rules of Probability
1. ( )
2. ( ) ( )
3. ( ) ( )
4. ( ) ( ) ( ) ( )
5. Definition: A and B are mutually exclusive if , i.e. ( ) .
6. Definition: If A and B are independent, it means ( ) ( ) ( ) , i.eP(A|B) = P(A).
7. Definition: Conditional probability is the probability of event A happens given that event B has
already happened. i.e. ( | ) ( ) ( | ) ( ) ( ).
( ) Note:
Not A = by
A and B =
A or B =
Remark:
1. ( ) ( ) ( )
2. ( ) ( ) ( )
Laws of Total Probability
is a partition of , i.e.⋃ and
.
Clearly, ( ) ∑ ( )
( ) ∑ ( | ) ( )
Figure 1: Partition of sample space
Bayes’ Theorem
Standard Form
( | ) ( ) ( | )
( )
Extended Form
By applying the Law of Total Probability, we can extend the form into the as below.
( | ) ( )
( | )
∑ ( | ) ( )
Random Variables & Distribution
Discrete Random variables
Probability density function (pdf) – f(x)
1. ( ) ( ) is non-negative
2. ∑ ( )
( ( ) ( )
3. ( ) Cumulative distribution function – F(X)
1. ( ) ( ) is monotonic, non-decreasing
2. ( )
( ) ( )
3.
Note:
For discrete random variables, F(x) is not very useful.
For continuous random variable, F(x) is fundamental.
Expectation/Mean – ( )/ ̅
Expectation is the weighted average of all possible outcomes of the random variable can take on. For
discrete random variables, the expectation is denoted by ∑ ( ).
Variance – ( )/
We define the variance as the measure of the uncertainty of the random variable. It is also the
spreadness of the data from the mean point or how far away is the data from its expected value. It is
calculated by taking the squared distance between the data and its expectation. For discrete random
variables, variance is denoted by ( ) [( ( )) ].
Theorem:
( ) ( ) ( )
Note:
( ( )) ∑ ( ) ( )
( ( ) ) ( ( ))
( ( ) ( ) ) ( ( ))
( ) is a linear operator, i.e.)( ( ) where is constant and is variable.
( ) ( )
( ) ( ) ( ) ( )
If are independent, then ( ) . (Converse is false!) Distributions
Bernoulli Distribution
( )
( ) ( ) {
( ) ( ) {
( ) ( )
Example: Tossing a coin, probability of getting a ‘Head’.
Binomial Distribution
( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ∑( ) ∑( ) ( )
( ) ( )
Example: Tossing a fair coin 50 times, the probability of getting x ‘Head’.
Geometric Distribution
( ) ( )
( ) ( ) ( )
( )
( ) ( ) ( )
( ) ( )
Example:
Two player takes turn to flip a coin, the first one to get a ‘Head’ wins. Given the probability of getting a
‘Head’ is p, find the probability of the first one to go wins.
Solution: 1. Let X be the one first to go, and Y be the one second to go.
( ) ( )
( ) is hard to solve because both X & Y are random variables. Intuitionally, we will fix one
of them so that it is easier to calculate. (Conditional probability)
By using conditional probability, we have ) ( | ) ( ).
( | ) ( )
( )
( ) ∑
( ) ( )
( ) ∑ ( )( ) ( )
2. Alternatively, we can use a table to show the relationship.
X 1 2 3 … n
P(X=x) p … ( )
Because we are finding the probability of the X wins, this means that whenever Y tosses a coin,
the outcome must be a failure i.e. q.
Therefore, if X success on the second try, this implies X fails the first time, hence we get a q.
Then, Y fails too in order for X to win, we g. Since X success on second try, we get. The
rest follows as above.
( )
( ) ( ) ∑
Hypergeometric Distribution
( )
( )
( )( )
( ) ( )
( )
Example: Probability of x red balls are drawn from a total of N balls with R red balls, N-R black balls with
k draws in total. Poisson Distribution
( ) ( )
( )
( ) ( )
( ) ( )
Note: It is often used to model rare events.
Poisson Process with rate
( ) ( )
1. ( )
2. For any , ( ) ( ) ( ) ( ) are all independent.
3. ( ) ( ) ( )
Example:
The time at which customers arrive follows a Poisson Process with .Given that store opens at 9
am, find the probability of exactly one customer by 9:30 am and a total of 5 customers by 11 am. Let X
be the event of getting x customers time, t after the initial time.
([ ( ) ( )] [ ( ) ( ) ])
( )
( ( ) ( ) ) ( ( ) ( ) )
Two consequences of (3)
( ) ( )
(i) ( ( ) ( ) ) ( )( ) ( )
(ii) ( ( ) ( ) ) ( ( ) ( ) ) ( ( ) ( ) )
( )
( ) ( ) ( ( ) ) ( )
In fact, ) ( ) ( )
Note: The definition of o(h) is if ( ) ( ) ( ). Negative Binomial Distribution
Let X be the number of trials until k success(es), and Y be the Recall
number of failures before achieving k success(es). Geometric(p), let X be the number of trials
until first success, and Y be the number of
Then, ( ) ( ) ( )
failures before first success. Then,
( ) ( )
( ) ( )
( ) ( )( )
, both X and Y are geometric
, both are called negative binomials. random variables.
Example: Pooled blood test
Let n be the total number of people, divided into groups of k. (Assuming n is divisible by k), X be the total
number of tests carried out, p be probability of infection, p is relatively small, independence for each
person, q=1-p.
1. For each group, ( ) ( ) ( ) ( )
( )( )
2. How many test needed on average? We have ( ) groups.
( ) ( )
3. We can optimize over k to make 2 as small as possible.
For small p, we can approximate ( ) around .
( ) ( ) ( ) ( )
( ) ( ) ( )
Multiple Random Variables
Joint Distribution
Probability of the event with values of variables X = x and Y = y occurs.
For discrete case, ( ) ( )
Marginal Distribution
Probability of an value of X = x regardless value of Y or vice versa.
For discrete case ( ) ( ) ( ) ( )
Conditional Distribution
Probability of value of X = x given that value of Y = y.
For discrete case, ( | )( | ) ( | ), ( |)( | ) ( | ). Summary:
Distribution Notation Discrete Case
Joint Distribution ( ) ( )
Marginal Distribution ( ) ( ) ( ) ( )
Conditional Distribution (| )( | ) ( |)( | ) ( | ) ( | )
Note:
The sum of all probability is∑ ∑ ( ) . Also written as ( ) .
( ) ∑ ( ) ( ) ∑ ( )
( ) ( )
| ( | ) ( ) | ( | ) ( )
and are independent if ( ) ( ) ( ).
Covariance
( ) – On average, do move together or in opposite direction
( ) ( ) ( ( )
Remark: ( ) ( )
Theorem:
If given are non-random and are random variables, then:
1. (∑ ) ∑ ( ) ∑ ( ) ( )
2. (∑ ∑ ) ∑ ( ) ( )
Note: 1 is like a square of sums i.e. ) and 2 is like multiplying two sums i.e. ( )( ).
Theorem
Fun fact: Which one is bigger,
1. ( ) ( ) ( ) ( ) (Proof: Same idea as Var(X))
2. If are independent, then ( ) ( ) ( ) , so ( ) or ( )?
( )
Solution: ( )
Note: Converse of 2 is false. ( ) ( )
Counterexample:

More
Less
Related notes for STAT 230