Class Notes
(807,038)
Canada
(492,560)
University of Toronto Scarborough
(30,792)
Statistics
(266)
STAB22H3
(208)
Ken Butler
(34)
Lecture
STAB22LEC03(3,4).docx
Unlock Document
University of Toronto Scarborough
Statistics
STAB22H3
Ken Butler
Fall
Description
STAB22 LEC03
(Covers:
 missed portions of chapter 3
 remaining portion of chapter 4
)
[CHAPTER3]
[11]
CONTINGENCY TABLES
 table that comprises of 2/more cvar's
(ex) year, opinion
[12]
READING A CONTINGENCY TABLE
 2 cvar's
 gender
 status of offer
Status of Offer:
Gender: ACCEPTED REJECTED TOTAL
MALE 490 210 700
FEMALES 280 220 500
TOTAL 770 430 1200
 4 boxes rep. 4 combinations of males and females rejected/accepted
 ie. 1) male, accepted
 ie. 2) male, rejected
 ie. 3) female, accepted
 ie. 4) female, rejected
 total column included to know what is out of what
 each total column or row totals up for ONE categorical var.
 ex. TOTAL row is totalling up "Status of Offer"  ex. TOTAL col. is totalling up "Gender'
Questions
 How many of the applicants were males who were rejected?
 total: applicants = 1200
 val. of interest: 2) male, rejected = 210
 relative frequency: 210/1200 * 100% = 17.5%
[13]
PERCENTAGE OF TOTAL
Status of Offer:
Gender: ACCEPTED REJECTED TOTAL
MALE 490 210 700
FEMALES 280 220 500
TOTAL 770 430 1200
Total Percent
 in this case, we are dividing EVERYTHING by the bottomright cell, then multiplying by 100%
(ex)
We get (Output from MyStatCrunch) Percentage of total = Joint Distribution
 Questions asked about this will be worded like "out of all people who applied…. "
 can realize that this is total percent by noticing that the bottomright cell is 100%
[14]
JOINT DISTRIBUTION (aka TOTAL PERCENT)
 dividing by everything; ie. the most bottomright cell, which corresponds to TOTAL row,
TOTAL column
 cannot use to asnwer question like "are more males accepted than females?
CONDITIONAL DISTRIBUTION
Row percent
 dividing each row by its corresponding TOTAL value
(ex)
 this can answer the question: out of all males, how much % of them were accepted?
 found that more % of males are accepted than % of females
(ex)
70% of males were accepted
56% of females were accepted
 can realize that this is row percent by noticing that the TOTAL column is all 100% [15]
Column percent
 dividing each column by its corresponding TOTAL value
 can realize that this is column percent by noticing that the TOTAL row is all 100%
(ex) this much % of people accepted were males
 this does not answer question that out of all males, were there more males accepted than
amt of females out of all females (which is answered by row percent)
We want: are there MORE males accepted than of the amount of FEMALES accepted?
(so we want ROW percent to take into account all cases of one gender, and see from which
gender were MORE % accepted than rejected) [16]
DECIDING BETWEEN ROW AND COLUMN PERCENTS
OUTCOME
 is retrieved for those values that are not fixed
 ex. 50% of people were males is fixed, so this is not an outcome
 ex. 50% of male applicants were rejected this is an outcome
=> accepted/rejected are outcomes, which were, for this table, retrieved from
doing Column Percents
[17]
Another example: AIRLINE PUNCTUALITY
 outcome is either ontime or delayed, which is retrieved from column percents
 after the were _____ , we want sth that is NONFIXED.
(ex) do NOT want: 66.2% of the flights ontime were America West this is an
outcome  if flight is a certain airline, then we are stuck with that
 ie. it is fixed. It can only be that type of airflight. However, if the result can differ,
then that var. will give outcomes (ex. Flight Status: "Ontime" or "Delayed")
Note about Outcomes
 apparently, what does NOT count as an outcome is data about things that are fixed
(unchanging)
(ex) of question about row percent
 out of all flights on time, which belonged to America West?
Observations from Column Percentage
 America West: 87% ontime, 13% delayed
 Alaska 56.7% on time, 18.27 delayed
=> Alaska is less punctual than America West
 ontime, delayed are outcomes
[18]
THREE CATEGORICAL VARIABLES AND SIMPSON'S PARADOX
 Although a contingency table can fit only 2 cvar's at a time, we have two contingency
tables that share in common two variables (offer status, gender), but each have a third
distinct variable: school
=> 3 categorical var's here
 school  offer status
 gender
[19]
PROFESSIONAL SCHOOLS
Overall
 more males (70%) accepted than females (56%)
Law school
 more females (33%) accepted than males (10%)
Business school
 more females (90%) accepted than males (80%)
Implication
 even tho. overall, more males accepted, if you look at both cases seperately (ie. law school
contingency table, business school contingency table), can see that females are accepted at
higher %
[20]
Why? Why do we get this answer?
Observe the third variable: School
Observe: Law school
 100 males applied, 10 got in
 but a lot more females (300) applied to this, and 100 got in
females
 tend to apply to law school, where it is harder to get in

More
Less
Related notes for STAB22H3