GGR270H1 Lecture Notes - Kelvin, Collectively Exhaustive Events, Univariate

42 views6 pages
15 Oct 2013
of 6
Variables and Data
Characteristic of the population that changes or varies overtime
Examples include temperature, income, education, etc.
o Look at the varying income over a period of time, or look at the
varying in different neighborhoods, cities, countries, regions
o Compare something that happens in one location compared to an
another location, ex. Looking at census tracts
o Ex. looking at two variables, looking at education and income how
those connect and vary in different locations
Observe and measure variables before you start testing them
Two Key Categories
o Quantitative numerical e.g. number of students who..
o Discrete (1,2,3,4…) or Conitious (1.5, 2.5,6.76, 3.89)
o Qualitative Non Numerical e.g. male/female, plant species
You can count the number of males or females in class
Results from measuring variables set of measurements
Different Categories Univariate, Bivariate , Multivariate
Variables Scales of Measurement I
Scale defines amount of information a variable contains and what statistical
techniques can be used
o EXAM: give a statistical problem and ask what statistical tests should
be used. Two key pieces: what is the scale of in cremations are the
variables measured at and how many sample am I dealing with?
These two questions will allow you narrow down which test you can
Four Scales
o Nominal
o Ordinal
o Interval
o Ratio
o ( ^ those are lowest to highest)
lowest has the least amount of information
you collect your data at the highest scale of information , just
because you are able to compress them later on, you choose
the highest scale based on what you are looking for
o Nominal
Lowest scale of measurement , no numerical value attached
Classifies observations into mutually exclusive ( when grouing,
they can only fit in one group and only one group alone) and
collectively exhaustive ( there must be a group, where the
values can fall under) groups
Simply the name or category of the variable you make
categories and give a numerical value , you see the frequency of
your observation
E.g. Occupation Type, gender , place of birth ( these are
categories, and you just count how many people are in each of
those categories )
Ex. for Occupation Type how many people are in
management, how many people are in general labour?
o Ordinal
Stronger scale as it allows data to be ordered or ranked
E.g. look at the 12 largest towns in a region, income by group
(high, middle, low) the process of ranking is ordinal level
The counts yield more information , because you are
able to weight each value
o Interval
Unit distance separating numbers is important
You can have a unit scale , each number has a lot more weight
attached to them
E.g. Temperature (C or F)
But does not allow ratios and does not have a “true” Zero . ex.
10 degrees is warmer than 5 degrees , but its not necessary
double the temperature. The only time you can have a zero
with temperature is when you use the Kelvin Scale this would
not be a interval scale because there is a definite zero.
o Ratio
Strongest scale of measurement
Ratios of distances on a number scale you can say something
is double something
Presence of an absolute “Zero”
E.g. how much a individual pay for rent a month, is a ratio scale
Anytime you can have zero as a value, then it’s a ratio scale
o In practice, we consider interval/ratio scales together
o Ex. You get the value of rent per month from a person that is ratio ,
and then you can convert to Ordinal by saying what category they fall
Describing Data I
Pie charts
o Circular graphs where measurement are distributed among categories
o You slice based on the frequency of the observation
o E.g. counting the number of people that use different types of transit
Distribution of Transit Use : cars, bikes, public transit
Bar Graph
o Graph where measurements are distributed among categories
o Ex. arranging how many students got a A ,B , C etc, in a course
Relative Frequency Histogram I
Graphs quantitate, rather than qualitative data
Vertical axis (y) shows “how often” measurements (frequency ) fall into a
particular class or subinterval.
Classes are plotted on the horizontal (x) axis
Rules of thumb
o 5 to 12 intervals or categories (Anything more, gets too complex)
o 1 + 3.3 Log10 (# of observations) to figure out how many groups you
o Must be mutually exclusive and collectively exhaustive (every element
in your data set must be able to fit in a class ) if they don’t fit , then
you often have to add one or two in order for them to fit
o Intervals should be the same width if one is too long, you just add
one more bar
Observations : 1, 11, 14, 21, 23, 27,28,33,35,50
# of Classes : k is just a notation for classes
k = 1 +3.3 log (10)
= 4.3 rounded up to 5
Class Width :
(Largest # - Smallest #)/ # of classes
= (50-1)/5
= 9.8 rounded up to 10
Therefore , we will have 5 classes and each class will be 10 units wide . When you
show this on excel you have to write out what you did.
o On the scale instead of going 0-10 and starting the next at 11. Your scale is
going to 0-9.9.
o The mathematics gives you a optimal number , you could have made 6
classes instead of 5. You have to explain why you made it 6 classes instead of
o You can put that jagged line from 0 to 100 , so you don’t have that huge gap