# PSYB07H3 Lecture Notes - Interquartile Range, Variance, Dependent And Independent Variables

TUTORIAL NOTES

Population vs. Sample

Population entire collection of events/participants for the experiment

Sample a small representation of the population used for the experiment

Random sampling method every individual in the population should have an equal chance of being

chosen (no inherent differences between sample and population; must do your best for sample to be

random).

*also known as external validity

Random assignment assign different condition of the study to each and different individual

*also known as internal validity

-when conduction an experiment, there are two variables: independent and dependent

Independent Variable something in our control, we can manipulate it

Dependent Variable depends on the independent variable (result of the study)

i.e. effect of drug on memory

No Drug

Drug A

Drug B

Drug C

Independent variable: selection of drugs, gender

Dependent: memory scores

-greater memory in drug C group

-in general, males have higher memory than female scores

Discrete variable can take on limited number of values

-drug and memory experiment is discrete (only four options for drugs and two options for gender)

Continuous Variable no restricted limits

Categorical Data divided into categories

- drug and memory experiment is categorical

Measurement Data if data is being measured (against some scale)

-therefore, memory scores: continuous (no limit, can take on any value) and measurement data

Summation Notation

1. ∑(x) = x1 + x2 + x3 + … + xn

2. ∑x 2 = x1

2 + x2

2 + x3

2 + … + xn

2

3. (∑x) 2 = (x1 + x2 + x3 + … + xn) 2

4. ∑(x-y 2) square y first, subtract it from x and add it all up

5. ∑(x-y ) 2 subtract first and then square the sum

Measures of Central Tendency

Mean ȳ = ∑y/n

∑y = n ∙ ȳ

Grand mean: ∑njȳj

∑nj

*this is used when you don’t have the actual data

Set of Scores: 1, 2, 5, 5, 5, 7, 7, 8, 8, 9, 9, 10, 10

Median location = n + 1 = 12 + 1 = 6.5th position (median between 6th and 7th number)

2 2

Median: 7.5

Mode: 5

-Median and mode are resistant to outliers (outliers draw man and median towards themselves).

Measures of Spread

Range the difference between the lowest score and the highest score

-sensitive to outliers

Interquartile Range: first and last 25% of the scores are removed

i.e. 1, 2, 5, 5, 5 | 7, 8, 8, 9, 9 | 10, 10, 11, 12, 15 | 16, 16, 16, 17, 20

Q1 median Q3

Range: 19

Median: 9.5

Interquartile Range (IQR): Q3-Q1 = 15.5 – 6 = 9.5

Variance

Variance average sum of the square deviance

*the square is to ensure that when you add them up, you don’t get zero

(y - ȳ)

(y - ȳ)2

1

-5

25

1

-5

25

2

-4

16

5

-1

1

5

-1

1

5

-1

1

7

1

1

8

2

4

8

2

4

8

2

4

9

3

9

9

3

9

10

4

16

ȳ = 6

0

116

Population variance: σ2 = (y - ȳ)2

n

-this is used only when you have the population or when you don’t need to generalize the sample to the

population

Sample variance: s2 = (y - ȳ)2

n-1

-this is used when you have the population

*assume that a question is asking for sample variance unless explicitly says population

Why is n-1 used instead of n for the sample variance?

Sample variance loses a degree of freedom this is because if n is used instead, it underestimates the

number of the population (so smaller than the population) and n-1 compensates for this the value of

the sample variance increases when the denominator is smaller as it becomes when it is n-1.