# CRIM 320 Lecture Notes - Null Hypothesis, Joint Probability Distribution, Frequency Distribution

19 views5 pages

13 Apr 2012

School

Department

Course

Professor

For unlimited access to Class Notes, a Class+ subscription is required.

Crim 320

Week 9

March 5th, Post-Midterm

Association between two categorical variables

Research Questions

- Testing the null hypothesis that two categorical variables (nominal, ordinal) are independent

- Moffitt’s (1993) life-course persistent theory of offending

o Research hypothesis

o H1: early onset offenders are more likely than late-onset offenders to become violent

offenders later on in life

o Null hypothesis

o H0 early onset offenders are not more likely than late-onset offenders to become

violent offenders later on in life

CONTINGENCY TABLE

Definition: The joint frequency distribution of two categorical variables refers to the simultaneous

occurrence of the first variable and another vent form the second variable (Bachman et al, 2004)

2 x 2 Contingency Table

Number of columbs

Number of rows 1

2 rows marginals

Total sample

size

1

A

B

R1

2

C

D

R2

Column marginals C1

C2

n

- Distribution of two categorical variables

- Analysis can only be done on the information which is available for both cases, i.e., if the second

chart has missing information for a number of kids, pay attention to it. Only take valid

percentage into account

- 2x2 able= 4 possible outcomes

o (1) no early onset – no fight between 15-18

o (2) early onset – no fight between 15-18

o (3) no early onset – fight between 15-18

o (4) early onset – fight between 15-18

Percentage difference

- A simple way to investigate a relationship between two categorical variables

- For each cell – the outcome is divided by the row marginal and multiplied by 100

- For late onset offenders

o 60.6% did not fight (215/355*100)

o 39.4% did fight (140/355*100)

- For early onset offenders

o 35.3 did not fight (12/34*100)

o 64.7 did fight (22/34*100)

- The percentage difference in prevalence of fighting in late adolescence between the early onset

and late-onset offenders is:

o 25.3% = (64.7% - 39.4%)

o 25% is pretty high but is it statistically significant? Or is it the result of sampling error?

How can we interpret this difference?

- Can we reject the null hypothesis that there is no association between two variables?

o What percentage difference would be expected by chance alone?

o What percentage difference would be large enough to reject the null hypothesis?

- Based on Moffitt’s theories, the variables are expected to be related because early onset

offenders are more likely to be characterized by neuropsychological deficits, which imply low

self control n manifest in committing violent crime when older (fighting between 15-18)

- Not that there is a causation but correlation

CHI-SQUARE TEST OF INDEPENDENCE

- Two-sample chi-square

- Test the null hypothesis that two categorical variables are independent from each other

- Definition: statistical test used for assessing how well the distribution of observed frequencies

of a categorical variable fits the distribution of expected frequencies

Observed frequencies

- Joint distribution of two categorical data in the sample

Expected frequencies

- Joint distribution we would expect if the two categorical data were independent from each

other

Calculating the Expected Frequencies

Formula: Fe = ((CS x RS)/GS) you’d do this for A, B,C, D (GS is the farthest right, and bottom. Same

for all four.)

Fe = expected frequency

CS = Column sum