Axiomatic approach to probability
▯ Axioms of probability and terminolgy
▯ Basic probability theorems
▯ Special cases of probability space:
- Discrete (▯nite and countably in▯nite)
- Continuous (uncountably in▯nite)
48 3.1 Axioms of probability 49
3.1 Axioms of probability
▯ An experiment, either natural or man-made, in which one among several
identi▯ed results are possible, is called a random experiment.
▯ The possible results of the experiments are called outcomes.
▯ A particular realization of the experiment, leading to a particular out-
come, is called a trial.
▯ In the axiomatic approach to probability, a random experiment is mod-
eled as a probability space, the latter being a triplet (S;F;P), where
- S is the sample space,
- F is the set of events (events algebra),
- P(:) is the probability function.
▯ These concepts are described individually below.
2003 Beno^▯t Champagne Compiled February 2, 2012 3.1 Axioms of probability 50
▯ The sample space S is the set of all possible results, or outcomes, of the
▯ In practical applications, S is de▯ned by the very nature of the problem
under consideration. S may be ▯nite, countably in▯nite or uncountably
▯ The elements of S, i.e. the experimental outcomes, will usually be de-
noted by lower case letters (e.g.: s;a;x; etc...)
I Consider a random experiment that consists in
ipping a coin twice. A suitable
sample space may be de▯ned as
S = fHH;HT;TH;TTg
where, for example, outcome HT corresponds to heads on the ▯rst toss and tails
on the second. Here, S is ▯nite with only 4 outcomes. J
2003 Beno^▯t Champagne Compiled February 2, 2012 3.1 Axioms of probability 51
▯ In probability theory, an event A is de▯ned as a subset of S, i.e. A ▯ S.
▯ Referring to a particular trial of the random experiment, we say that A
occurs if the experimental outcome s 2 A.
▯ Special events S and ;:
- Since for any outcome s, we have s 2 S by de▯nition, S always
occurs and is thus called the certain event.
- Since for any outcome s, we have s 62 ;, ; never occurs and is thus
called the impossible event.
Example 3.1 (continued):
I Consider the event A = fgetting heads on the ▯rst
ipg. This can equivalently
be represented by the following subset of S:
A = fHH;HTg ▯ S
Let s denote the outcome of a particular trial:
if s = HH or HT ) A occurs
if s = TH or TT ) A does not occur
2003 Beno^▯t Champagne Compiled February 2, 2012 3.1 Axioms of probability 52
▯ Let F denote the set of all events under consideration in a given random
experiment. Note that F is a set of subsets of S
- F must be large enough to contain all interesting events,
- but not so large as to contain impractical events that lead to math-
ematical di▯culties. (This may be the case when S is uncountably
in▯nite, e.g. S = R .)
▯ In the axiomatic approach to probability, it is required that F be a
(a) S 2 F
(b) A 2 F ) A 2 F c
(c) A 1A ;2:: 2 F ) [ A 2iFi
▯ Whenever S is ▯nite, the simplest and most appropriate choice for F is
generally the power set P . S
▯ The proper choice for F when S in in▯nite will be discussed later.
2003 Beno^▯t Champagne Compiled February 2, 2012 3.1 Axioms of probability 53
Example 3.1 (continued):
ipping a coin twice and let S = fHH;HT;TH;TTg be the corre-
sponding sample space. An appropriate choice for F hereSis P , i.e. the set of all
subsets of S:
PS = f;;fHHg;fHTg;fTHg;fTTg;fHH;HTg;fHH;THg;
Note that F = P Sontains 16 = 2 di▯erent subsets, i.e. events, that may or
may not occur during a particular realization of the random experiment. For
example, the event fHH;HT;THg 2 F corresponds to obtaining at least one
heads when you
ip the coin twice.
If you think about it, each event corresponds to a speci▯c statement about the
experimental outcome and here, there are only 16 possible di▯erent statements
of this type that can be made. J
2003 Beno^▯t Champagne Compiled February 2, 2012 3.1 Axioms of probability 54
The probability function:
▯ P is a function that maps events A in F into real numbers in R, that is:
P : A 2 F ! P(A) 2 R (3.1)
The number P(A) is called the probability of the event A.
▯ The function P(:) must satisfy the following axioms:
Axiom 1: The function P is non-negative:
P(A) ▯ 0 (3.2)
Axiom 2: The function P is normalized so that
P(S) = 1 (3.3)
Axiom 3: Let A 1A 2A ;3:: be a sequence of mutually exclusive events,
that is, i \ Aj= ; for i 6= j. Then
P( A i = P(A i (3.4)
2003 Beno▯t Champagne Compiled February 2, 2012 3.1 Axioms of probability 55
▯ From an operational viewpoint, the number P(A) may be interpreted as
a measure of the likelihood of event A in a particular realization of the
▯ If P(A) = P(B), we say that events A and B are equally likely (this
does NOT imply that A = B).
▯ As a special case of Axiom 3, it follows that for any events A and B,
A \ B = ; ) P(A [ B) = P(A) + P(B) (3.5)
▯ In the special case of a ▯nite sample space S, it can be shown that (3.5)
is in fact equivalent to Axiom 3. Thus, when S is ▯nite, we may replace
Axiom 3 (in▯nite additivity) by the simpler condition (3.5).
2003 Beno^▯t Champagne Compiled February 2, 2012 3.1 Axioms of probability 56
Example 3.1 (continued):
I Let the function P be de▯ned as follows, for any A 2 F:
P(A) , N(A)
where N(A) denotes the number of elements in subset A. For example, consider
event A = fat least on tailsg; we have
A = fTH;HT;TTg ) N(A) = 3
) P(A) =
It can be veri▯ed easily that function P satis▯es all the axioms of probability:
- Axiom 1: For any event A, N(A) ▯ 0 and therefore, P(A) = N(A)=4 ▯ 0.
- Axiom 2: Since N(S) = 4, we immediately obtain P(S) = N(S)=4 = 1.
- Axiom 3: Observe that if A \ B = ;, then N(A [ B) = N(A) + N(B) and
N(A [ B)
P(A [ B) =
= 4 + 4 = P(A) + P(B)
2003 Beno▯t Champagne Compiled February 2, 2012 3.2 Basic theorems 57
3.2 Basic theorems
Introduction: Several basic properties follow from the axiomatic de▯nition
of the probability function P(A). These are listed below as theorems along
with their proof.
Theorem 3.1: For any event A 2 F:
P(A ) = 1 ▯ P(A) (3.6)
Proof: Observe that A \ A = ; and A [ A = S. Thus, using Axiom 3,
we have: P(A) + P(A ) = P(A [ A ) = P(S) = 1, or equivalently, P(A ) = c
1 ▯ P(A). ▯
Corollary: For any event A 2 F:
0 ▯ P(A) ▯ 1 (3.7)
Proof: Left as exercise. ▯
P(;) = 0: (3.8)
Proof: Observe that ; = S . Thus, invoking Theorem 3.1 and Axiom 2, we
have: P(;) = P(S ) = 1 ▯ P(S) = 0 . ▯
2003 Beno^▯t Champagne Compiled February 2, 2012 3.2 Basic theorems 58
Theorem 3.3: If A ▯ B, then
(a) P(B ▯ A) = P(B) ▯ P(A) (3.9)
(b) P(A) ▯ P(B) (3.10)
Proof: Since A ▯ B, set B may be expressed as the union B = A [ (B ▯ A)
where A and B ▯ A are mutually exclusive, that is A \ (B ▯ A) = ;. The
Venn diagram below illustrates this situation:
Figure 3.1: Venn diagram for Theorem 3.3.
Using axiom 3, we have
P(B) = P(A [ (B ▯ A)) = P(A) + P(B ▯ A) (3.11)
which proves part (a). To prove part (b), simply note (see Axiom 1) that
P(B ▯ A) ▯ 0. ▯
2003 Beno▯t Champagne Compiled February 2, 2012 3.2 Basic theorems 59
Theorem 3.4: For arbitrary events A and B, we have
P(A [ B) = P(A) + P(B) ▯ P(A \ B) (3.12)
Proof: Observe that for any events A and B, we can always write
A [ B = A [ (B ▯ (A \ B)) (3.13)
where A and B▯(A\B) are mutually exclusive. This is illustrated by means
of a Venn diagram below:
Figure 3.2: Venn diagram for Theorem 3.4. (Note: AB ▯ A \ B.)
Invoking Axiom 3, we ▯rst obtain
P(A [ B) = P(A) + P(B ▯ (A \ B))
Since A \ B ▯ B, Theorem 3.3 yields
P(B ▯ (A \ B)) = P(B) ▯ P(A \ B)
Eq. (3.12) follows by combining the above two identities. ▯
2003 Beno▯t Champagne Compiled February 2, 2012 3.2 Basic theorems 60
▯ Theorem 3.4 may be generalized to a union of more than two events.
▯ In the case of three events, say A, B and C, the following relation can
P(A[B[C) = P(A)+P(B)+P(C)▯P(AB)▯P(AC)▯P(BC)+P(ABC):
▯ The above formula can be proved by repeated application of Theorem
3.4. This is left as an exercise.
▯ See the textbook for a more general formula applicable to a union of n
events, where n is an arbitrary positive integer.
Theorem 3.5: For any events A and B:
P(A) = P(A \ B) + P(A \ B ): (3.15)
Proof: The theorem follows from Axiom 3 by noting that A \ B and A \ B c
are mutually exclusive and that their union is equal to A (see Fig. 3.3). ▯
Figure 3.3: Venn diagram for Theorem 3.5.
2003 Beno^▯t Champagne Compiled February 2, 2012 3.2 Basic theorems 61
I In a certain city, three daily newspapers are available, labelled here as A, B and C
for simplicity. The probability that a randomly selected person reads newspaper
A is P(A) = :25. Similarly, for newspapers B and C, we have P(B) = :20
and P(C) = :13. The probability that a person reads both A and B is P(AB) =
P(A\B) = :1. In the same way, P(AC) = :08, P(BC) = :05 and P(ABC) = :04.
(a) What is the probability that a randomly selected person does not read any
of these three newspapers?
(b) What is the probability that this person reads only B, i.e. reads B but not
A nor C?
2003 Beno^▯t Champagne Compiled February 2, 2012 3.2 Basic theorems 62
Theorem 3.6: For any increasing or decreasing sequence of events A ;A1;A 2:::3
i!1 P(A )i= P(limi!1) i (3.16)
▯ Recall that a sequence A ;i 2 N, is increasing if A ▯1A ▯ A2▯ :::3 in
which case we de▯ne lim i!1 A i i=1A i
▯ Similarly, a sequence A ;i 2 N, is decreasing if A ▯1A ▯ A 2 :::,3in
which case we de▯ne lim i!1 A i i=1A i
▯ Theorem 3.6 is essentially a statement about the continuity of the prob-
ability function P.
▯ Speci▯cally, it says that under proper conditions on the sequence A i
(i.e. increasing or decreasing), the limit operation in (3.16) can be passed
inside the argument of P(:).
2003 Beno^▯t Champagne Compiled February 2, 2012 3.2 Basic theorems 63
Proof (optional reading): First consider the case of an increasing seque1ce, 2.e. A ▯ A ▯
A 3 ::: De▯ne a new sequence of events as follows1 B =1A and i = Ai▯ A i▯1for any
integer i ▯ 2. Note that the evenis B so de▯ned are mutually exclusivei B j ; if
i 6= j. Furthermore, the following relations hold
B j = A i
B j = A j
Making use of above results together with Axiom 3, we ▯rst obtain:
1 1 X1
P(lim A i = P( Aj) = P( B j = P(Bj) (3.17)
i!1 j=1 j=1 j=1
Finally, the in▯nite summation can be expressed in terms of limits as follows:
X X Si
P(B j = lim P(B j = lim P( B j = lim P(A i (3.18)
i!1 i!1 j=1 i!1
A proof of (3.16) for decreasing sequences can be derived in a somewhat similar way. ▯
2003 Beno^▯t Champagne Compiled February 2, 2012 3.3 Discrete probability space 64
3.3 Discrete probability space
▯ In many applications of probability (games of chance, simple engineering
problems, etc.), the sample space S is either ▯nite or countably in▯nite.
The word discrete is used to describe anyone of these two situations.
▯ Speci▯cally, we say that a probability space (S;F;P) is discrete when-
ever the sample space S is ▯nite or countably in▯nite.
▯ In this section, we discuss discrete spaces along with related special cases
2003 Beno^▯t Champagne Compiled February 2, 2012 3.3 Discrete probability space 65
3.3.1 Finite probability space
▯ The sample space S is a ▯nite set comprised of N distinct elements:
S = fs 1s 2:::;sNg (3.19)
where N is a positive integer and s dinotes the ith possible outcome.
▯ In the ▯nite case, it is most convenient to take for events algebra the
power set of the sample space S:
F = P S
= set of all subsets of S
= f;;fs g1fs g;2::;fs gNfs ;s 1;f2 ;s 1;:3:;Sg (3.20)
▯ That is, the events algebra consists of all possible subsets of S. Indeed,
in the ▯nite case, it is usually not advantageous nor necessary to exclude
certain subsets of S from F.
▯ Recall that P , the power set of S, contains 2 distinct elements (i.e. sub-
sets). Thus, there are 2 possible events or di▯erent statements that can
be made about the experimental outcome.
2003 Beno^▯t Champagne Compiled February 2, 2012 3.3 Discrete probability space 66
▯ In the ▯nite case, a standard way to de▯ne the probability function P(:)
is via the introduction of a probability mass p i
▯ To each s i S, i = 1;::