# GGR270H1 Lecture Notes - Kelvin, Collectively Exhaustive Events, Univariate

42 views6 pages

Variables and Data

Variable

Characteristic of the population that changes or varies overtime

Examples include temperature, income, education, etc.

o Look at the varying income over a period of time, or look at the

varying in different neighborhoods, cities, countries, regions

o Compare something that happens in one location compared to an

another location, ex. Looking at census tracts

o Ex. looking at two variables, looking at education and income how

those connect and vary in different locations

Observe and measure variables – before you start testing them

Two Key Categories

o Quantitative – numerical e.g. number of students who..

o Discrete (1,2,3,4…) or Conitious (1.5, 2.5,6.76, 3.89)

o Qualitative – Non Numerical e.g. male/female, plant species

You can count the number of males or females in class

Data

Results from measuring variables – set of measurements

Different Categories – Univariate, Bivariate , Multivariate

Variables – Scales of Measurement I

Scale defines amount of information a variable contains and what statistical

techniques can be used

o EXAM: give a statistical problem and ask what statistical tests should

be used. Two key pieces: what is the scale of in cremations are the

variables measured at and how many sample am I dealing with?

These two questions will allow you narrow down which test you can

use.

Four Scales

o Nominal

o Ordinal

o Interval

o Ratio

o ( ^ those are lowest to highest)

lowest has the least amount of information

you collect your data at the highest scale of information , just

because you are able to compress them later on, you choose

the highest scale based on what you are looking for

o Nominal

Lowest scale of measurement , no numerical value attached

Classifies observations into mutually exclusive ( when grouing,

they can only fit in one group and only one group alone) and

collectively exhaustive ( there must be a group, where the

values can fall under) groups

Simply the name or category of the variable – you make

categories and give a numerical value , you see the frequency of

your observation

E.g. Occupation Type, gender , place of birth ( these are

categories, and you just count how many people are in each of

those categories )

Ex. for Occupation Type – how many people are in

management, how many people are in general labour?

o Ordinal

Stronger scale as it allows data to be ordered or ranked

E.g. look at the 12 largest towns in a region, income by group

(high, middle, low) – the process of ranking is ordinal level

The counts yield more information , because you are

able to weight each value

o Interval

Unit distance separating numbers is important

You can have a unit scale , each number has a lot more weight

attached to them

E.g. Temperature (C or F)

But does not allow ratios and does not have a “true” Zero . ex.

10 degrees is warmer than 5 degrees , but its not necessary

double the temperature. The only time you can have a zero

with temperature is when you use the Kelvin Scale – this would

not be a interval scale because there is a definite zero.

o Ratio

Strongest scale of measurement

Ratios of distances on a number scale – you can say something

is double something

Presence of an absolute “Zero”

E.g. how much a individual pay for rent a month, is a ratio scale

Anytime you can have zero as a value, then it’s a ratio scale

o In practice, we consider interval/ratio scales together

o Ex. You get the value of rent per month from a person that is ratio ,

and then you can convert to Ordinal by saying what category they fall

under

Describing Data I

Graphs

Pie charts

o Circular graphs where measurement are distributed among categories

o You slice based on the frequency of the observation

o E.g. counting the number of people that use different types of transit –

Distribution of Transit Use : cars, bikes, public transit

Bar Graph

o Graph where measurements are distributed among categories

o Ex. arranging how many students got a A ,B , C etc, in a course

Relative Frequency Histogram I

Graphs quantitate, rather than qualitative data

Vertical axis (y) shows “how often” measurements (frequency ) fall into a

particular class or subinterval.

Classes are plotted on the horizontal (x) axis

Rules of thumb

o 5 to 12 intervals or categories (Anything more, gets too complex)

o 1 + 3.3 Log10 (# of observations) – to figure out how many groups you

need

o Must be mutually exclusive and collectively exhaustive (every element

in your data set must be able to fit in a class ) – if they don’t fit , then

you often have to add one or two in order for them to fit

o Intervals should be the same width – if one is too long, you just add

one more bar

Example:

Observations : 1, 11, 14, 21, 23, 27,28,33,35,50

# of Classes : k is just a notation for classes

k = 1 +3.3 log (10)

= 4.3 rounded up to 5

Class Width :

(Largest # - Smallest #)/ # of classes

= (50-1)/5

= 9.8 rounded up to 10

Therefore , we will have 5 classes and each class will be 10 units wide . When you

show this on excel you have to write out what you did.

o DO NOT PUT A GAP BETWEEN THE BARS

o On the scale instead of going 0-10 and starting the next at 11. Your scale is

going to 0-9.9.

o The mathematics gives you a optimal number , you could have made 6

classes instead of 5. You have to explain why you made it 6 classes instead of

5. ALWAYS DOCUMENT AND GIVE REASON TO WHY YOU ARE DOING

WHAT YOU ARE DOING.

o You can put that jagged line from 0 to 100 , so you don’t have that huge gap