Study Guides (390,000)
CA (150,000)
UTSG (10,000)
Geography (100)

GGR270H1 Study Guide - Univariate, Descriptive Statistics, Central Tendency


Department
Geography
Course Code
GGR270H1
Professor
Damian Dupuy

This preview shows half of the first page. to view the full 2 pages of the document.
1) Simple Measures – (one variable/ you do census) (two variable/ you do correlations (connection between two variables, ex education and income) 2) Probability and
Distributions - the chances of something happening (ex- chances of winning a lottery) 3) Statistical Estimation - ex. Take the test marks of the past 3 years and estimate what
those tests marks be in future / there is a process that we have to do and then there is a set of ways and techniques that we have to look at 4) What are (is) Statistics??? -
Referring to set of techniques Any collection of numerical data. 7) Vital Statistics - birth rates, Death rates (Use this data and compare across jurisdictions) 5) Economic
Indicators – unemployment rates, income levels / measuring different things and looking at combination of these, how they all work together 6) Methodology for collecting ,
Presenting and Analyzing data = Summarize findings= Theory validation =Its your argument that matters = Forecast (we can forecast change, ex. What will population of Toronto
will look like in 2015 Evaluate (analyze the results and do some evaluation, Then it allows us to Select among alternatives 7) Descriptive and Inferential Statistics = Descriptive
Organizing and Summarizing of Data = summaries of our data , what it actually shows, Replace large set of numbers with small summary measures Goal of techniques is to
minimize information loss ex. Asking people to categorize themselves in the income group, and not actually asking how much you make.. 8) Inferential Links descriptive statistics
to probability theory. EX: what are the chances of the average happening again in the midterm Generalize results of smaller group to a much larger one Goal is to “infer” (conclude)
something about a larger group by looking at a smaller one. 9) Population = Population is the bigger group / Total set of elements (objects, persons, regions) under examination/
Denoted as N (number of the observations) (The size of the population or sample) 10) Sample = Subset of elements in the population / Used to make inferences about certain
characteristics of the population /Try to predict the behaviour of the population by looking closely at the sample Denoted as n .
1)Variable = Charactestics of the population that changes or varies over time / Where things happen / ex. Income distribution, / Space helps us to look at these variables / Examples
include temperature, income, education etc... /Basically, we are measuring temperature, education level,/ ** Observe and measure variables /Two Key Categories 2)Quantitative
numerical eg. Numbers of students who.../ we can measure by using numbers, we can assing numberrs or we can count /Discrete (1,2,3,4...) or Continuous (1.5. 2.76, 3.445...)
3)Qualitative – Non Numerical e.g MALE/FEMALE, plant species, education type / sometimes based on what you are doing, qualitative can be assigned to quantitative, 2 men or 6
women 4)Data = Data are always plural , because it is always based on numbers and many things / Results from measuring variables -set of measurements*** / you have a
variable and then after measuring, what it gives you is a set of DATA / Different Categories / Univariate (One variable and one set data) / Bivariat (there are two variables at
work) / Multivariate (measuring gender, income, level of school, when you have bunch of measurements, then you have a series of variables)./ Variables and Data can be defined
in simple way, however it is more than that, and they influence how we measure and so on. 5)Variables – Scales of Measurement 1 = Scale defines amount of information a
variable contains and what statistical techniques can be used / How much informaiton should that variable have for us./ All variables contain information, so basically how we
measure is based on how we get that information 6) Four Scales (Ranges from Lowest to Highest) - Nominal & Ordinal / -Interval & Ratio.//. We always try to measure our
variables at the RATIO scale./ You can decomposed , it is more precise , ZERO is the KEY / The more information you have the more you can push it all together and make it a
nominal number if you wish. / You can never go the opposite the way. 7) Nominal = scale of measurement, no numerical value attached / those numbers that dont carry weight, or
information about them / Classifies observations into mutually exclusive and collective exhausted groups / every observation must fit into category / they must be mutually
exclusive or different from the groups,/ which means you have one choice and only, and must fit into only ONE / ex. WE are just assigning names, Male and Female, there is no
weight or scale / Often called “categorical” data / e.g occupation type, gender, place of birth / objects placed into a box . 8) Ordinal = Stronger scale as it allows data to be ordered
or ranked / ex. categoriziing people into groups 9)NOMINAL would ask do you have income, Ordinal would give you option and you have to pick one, example, which category of
income are you,1) 0-1000 2) 2000-3000 / E.g 12 largest towns in a region, income by group (high, middle, low) 9) Interval = Unit distance seperating numbers is important / it
doesnt convey some weighting associated with it / E.g Temperature (F of C), taxable income ($) / freezing point can be different / It is very tricky to find 10) Ratio = Strongest scale
of measurement / Ratios of distances on a number scale / It is meaningfull / Presence of an absolute 'ZERO' / You cant have less then ZERO / The grade you get is example of the
RATIO – ex. You got 20% , which is 20x better than 0 / e.g Temperature (Kelvin) Income from all sources ($), population of a city / In practise , we consider (interval/ratio scales
together) DESCRIBING DATA 1 GRAPHS 11) Pie Charts Circular graph where measurements are distributed among categories 12) Bar Graph =Graph where measurements are
distributed among categories. 13) Relative Frequency Histogram 1 = It is important in terms of understanding it gives us the bases of normal curve or bell curve / Graphs
quantitative, rather than qualitative data / Vertical axis (Y) shows “how often” measurements fall into a particular class or subinterval / Classes are plotted on the horizontal (x) axis /
RULES OF THUMB - how many classes should I HAVE? // 5-12 intervals or categories ***** 1+3.3 LOG10 x (number of observations)= ? / Must be mutually exclusive and
collective exhaustive / every observation must fit into category and only ONE category / e.g ( 7 classes and we have 1 observation that doesnt fit into one of these category, what we
do is we create another category, of the same size) / Intervals should be the same width / MUST all be the same size.--> Class Width: (take the largest number minus smallest
number and divided by the number of classes ^^)/ if you got 1 decimal place you want to see 1 decimal place and so on / Dont Give any space between the BARS on HISTOGRAM
||| 14) Skewness = Is the distribution symmetric? / If there is symmetry then the graph will look like a normal curve that looks the same on both sides / Normal looks like a Bell
Curve – that means we dont have Skewness / There are can be - Positively Skewed --> that tail is towards the right AND Negatively Skewed --> the tail is towards the left / the
curve gets pulled out 15) MODE - NORMAL with one PEAK and Bimodal – two peaks. 16) Kurtosis = How peaked is the distribution? is it a short or a tall peak? Which referers
to the property of Kurtosis. 17) Normal Curve = is called Mesokurtic / the middle mode has a semmityc distribution / normal spread 18) Platykurtic = although the curve is
normal / it is very flatly spread curve / spread of values. 19) Leptokurtic = abnormal distribution / very tall and very peaked curve. 20) Measures of the Centre = deals with the
average of the mean. 21) Measures of Variability = when you comparing the numbers from the centre / direction and the size of the dispersion . 21) Statistics and Parameters =
Graphs are limited in what they can tell us / Every time we summarize the more information we lose / Difficulty making inferences about a population when looking at a subject or
sample / Therefore, we need to use numerical measures / Measures associated with the population are called Parameters /|| Measures associated with a sample are called
Statistics. 22) Measures of the Centre = Mean --> mean of the value (e.g. When we ask what is the average) / most commonly used to measure of central tendency / chose the
mean – the one that heals the most information (if left to chose one) / Sum of all values or observation divided by the number of observations
||| Sample: Population
(always round up based on the data you given, like the numbers above)
((( n = number of observations - Mean = X ( -- with a bar on top) -- E = sigma (means you are adding up something) -- Xi = sum of all observation -- i=1 = add every
observation to sigma)
Measures of the Centre 2
1) Median = Value occupying the 'middle possition' in an ordered set of observations / with the median the value doesnt necessarly matter, its the value that occupies the middle
possition. / Order the observations, lowest to highest, and find the middle position == Even Observation --> Formula: 0.5 (n+1) (this gives you the position) 2) Median Example
– Even Observations 0.5(6+1)=3.5th position in the ordered set / So, add 3rd and 4th and divide by 0.5 = 1/2 (9.1+9.4) = 1/2 (18.5)=9.25 = Median Temperature is 9.25. ||| 3) Mode
= Value that occurs with the highest frequency / Allows you to locate the peak of a relative frequency histogram / Mode is 3 household members (based on the table presented)
Choosing an Appropriate Measure == Mean is usually best measure as it is sensitive to change in a single observation / it is able to take the changes that can or might occur in the
observation / it is also or can be the most represntative / however, it is not always a good measure, distribution by-mobil / it is not actually telling you that high peak /But not a good
measure when.. / 4)Distribution is bi-modal (2 modes) 5)Skewed distributions = Instead you can Use median / It is good to use for normal graph. 6) Normal = Mean, Median and
Mode will be found in the middle. 7) Bimodal = Two modes/ Mean and Median in the Middle. 8) Positively Skewed = Mode will always be to the LEFT / MEAN will always be to
the RIGHT / because it is taking on-board the value of the outlier / Median will be somewhere in the middle . 9) Negatively Skewed = Mode will be on RIGHT / Mean will be on
LEFT / Median will always be somewhere in the middle. ||| Measures of Dispersion 1 10) Range = Simplest Measure of Dispersion / Takes difference between smallest and largest
value in the data-set, at the interval/ratio scale / But influenced by outliers Formula : Range = Xmax – Xmin (the largest value minus the smalles value) 11) Quartiles = Can
yield more information and lessen impact of outliers / Data are divided into quartiles (4 groups) / divides data into four equal groups / it arranges in an increasing order / because it
uses the median it lessens the impact of outliers. 12) Standard Deviation and Variance = Two of the most commonly used measures of dispersion / Comparing value of each
measure to the Mean Formula: Xi – X(bar on top) Two key properties of the mean / value raltionship / Sum of differences will always add up to Zero / Sum of squared
difference will be the minimum sum possible.--Called “Least Squares” property / Least Squares property carries over into calculating. Variance= S2
FORMULA : small n = sample / -1 is the correction / anytime we are using sample data – we will make it n-1 that could remove any bias.
Problem with theVariance: You get a massive value that does not match the data, therefore, you bring it down that you could compare against the original data you have, / to do that
you get to Square Root == to do the Square root you have to just do small changes to the Formula 13) Skewness = Measures the degree of
symmetry in a frequency distribution / Determines how evenly (or unevenly) the values are distributed either side of the mean Formula:
14)Coefficient of Variation = it allows or comparison of variability spatial samples / It allows you to Test which sample has the greatest variability / Standard Deviation or Variance
are absolute measures so, they are influenced by the size of the values in the dataset / To allows a comparison of variation across two or more geographic samples, can use a
relative measure of dispersion called Coefficient of Variation /
Formula / the larger the value the greater variability.
You're Reading a Preview

Unlock to view full version