cheat sheet1.odt

2 Pages
Unlock Document

Damian Dupuy

1) Simple Measures – (one variable/ you do census) (two variable/ you do correlations (connection between two variables, ex education and income) 2) Probability and Distributions - the chances of something happening (ex- chances of winning a lottery) 3) Statistical Estimation - ex. Take the test marks of the past 3 years and estimate what those tests marks be in future / there is a process that we have to do and then there is a set of ways and techniques that we have to look at 4) What are (is) Statistics??? - Referring to set of techniquesAny collection of numerical data. 7) Vital Statistics - birth rates, Death rates (Use this data and compare across jurisdictions) 5) Economic Indicators – unemployment rates, income levels / measuring different things and looking at combination of these, how they all work together 6) Methodology for collecting , Presenting andAnalyzing data = Summarize findings= Theory validation =Its your argument that matters = Forecast (we can forecast change, ex. What will population of Toronto will look like in 2015 Evaluate (analyze the results and do some evaluation, Then it allows us to Select among alternatives 7) Descriptive and Inferential Statistics = Descriptive Organizing and Summarizing of Data = summaries of our data , what it actually shows, Replace large set of numbers with small summary measures Goal of techniques is to minimize information loss ex.Asking people to categorize themselves in the income group, and not actually asking how much you make.. 8) Inferential Links descriptive statistics to probability theory. EX: what are the chances of the average happening again in the midterm Generalize results of smaller group to a much larger one Goal is to “infer” (conclude) something about a larger group by looking at a smaller one. 9) Population = Population is the bigger group / Total set of elements (objects, persons, regions) under examination/ Denoted as N (number of the observations) (The size of the population or sample) 10) Sample = Subset of elements in the population / Used to make inferences about certain characteristics of the population /Try to predict the behaviour of the population by looking closely at the sample Denoted as n . 1)Variable = Charactestics of the population that changes or varies over time / Where things happen / ex. Income distribution, / Space helps us to look at these variables / Examples include temperature, income, education etc... /Basically, we are measuring temperature, education level,/ ** Observe and measure variables /Two Key Categories 2)Quantitative – numerical eg. Numbers of students who.../ we can measure by using numbers, we can assing numberrs or we can count /Discrete (1,2,3,4...) or Continuous (1.5. 2.76, 3.445...) 3)Qualitative – Non Numerical e.g MALE/FEMALE, plant species, education type / sometimes based on what you are doing, qualitative can be assigned to quantitative, 2 men or 6 women 4)Data = Data are always plural , because it is always based on numbers and many things / Results from measuring variables -set of measurements*** / you have a variable and then after measuring, what it gives you is a set of DATA/ Different Categories / Univariate (One variable and one set data) / Bivariat (there are two variables at work) / Multivariate (measuring gender, income, level of school, when you have bunch of measurements, then you have a series of variables)./ Variables and Data can be defined in simple way, however it is more than that, and they influence how we measure and so on. 5)Variables – Scales of Measurement 1 = Scale defines amount of information a variable contains and what statistical techniques can be used / How much informaiton should that variable have for us./All variables contain information, so basically how we measure is based on how we get that information 6) Four Scales (Ranges from Lowest to Highest) - Nominal & Ordinal / -Interval & Ratio.//. We always try to measure our variables at the RATIO scale./ You can decomposed , it is more precise , ZERO is the KEY / The more information you have the more you can push it all together and make it a nominal number if you wish. / You can never go the opposite the way. 7) Nominal = scale of measurement, no numerical value attached / those numbers that dont carry weight, or information about them / Classifies observations into mutually exclusive and collective exhausted groups / every observation must fit into category / they must be mutually exclusive or different from the groups,/ which means you have one choice and only, and must fit into only ONE / ex. WE are just assigning names, Male and Female, there is no weight or scale / Often called “categorical” data / e.g occupation type, gender, place of birth / objects placed into a box . 8) Ordinal = Stronger scale as it allows data to be ordered or ranked / ex. categoriziing people into groups 9)NOMINAL would ask do you have income, Ordinal would give you option and you have to pick one, example, which category of income are you,1) 0-1000 2) 2000-3000 / E.g 12 largest towns in a region, income by group (high, middle, low) 9) Interval = Unit distance seperating numbers is important / it doesnt convey some weighting associated with it / E.g Temperature (F of C), taxable income ($) / freezing point can be different / It is very tricky to find 10) Ratio = Strongest scale of measurement / Ratios of distances on a number scale / It is meaningfull / Presence of an absolute 'ZERO' / You cant have less then ZERO / The grade you get is example of the RATIO – ex. You got 20% , which is 20x better than 0 / e.g Temperature (Kelvin) Income from all sources ($), population of a city / In practise , we consider (interval/ratio scales together) DESCRIBING DATA 1 GRAPHS 11) Pie Charts Circular graph where measurements are distributed among categories 12) Bar Graph =Graph where measurements are distributed among categories. 13) Relative Frequency Histogram 1 = It is important in terms of understanding it gives us the bases of normal curve or bell curve / Graphs quantitative, rather than qualitative data / Vertical axis (Y) shows “how often” measurements fall into a particular class or subinterval / Classes are plotted on the horizontal (x) axis / RULES OF THUMB - how many classes should I HAVE? // 5-12 intervals or categories ***** 1+3.3 LOG10 x (number of observations)= ? / Must be mutually exclusive and collective exhaustive / every observation must fit into category and only ONE category / e.g ( 7 classes and we have 1 observation that doesnt fit into one of these category, what we do is we create another category, of the same size) / Intervals should be the same width / MUST all be the same size.--> Class Width: (take the largest number minus smallest number and divided by the number of classes ^^)/ if you got 1 decimal place you want to see 1 decimal place and so on / Dont Give any space between the BARS on HISTOGRAM ||| 14) Skewness = Is the distribution symmetric? / If there is symmetry then the graph will look like a normal curve that looks the same on both sides / Normal looks like a Bell Curve – that means we dont have Skewness / There are can be - Positively Skewed --> that tail is towards the rightAND Negatively Skewed --> the tail is towards the left / the curve gets pulled out 15) MODE - NORMAL with one PEAK and Bimodal – two peaks. 16) Kurtosis = How peaked is the distribution? is it a short or a tall peak? Which referers to the property of Kurtosis. 17) Normal Curve = is called Mesokurtic / the middle mode has a semmityc distribution / normal spread 18) Platykurtic = although the curve is normal / it is very flatly spread curve / spread of values. 19) Leptokurtic = abnormal distribution / very tall and very peaked curve. 20) Measures of the Centre = deals with the average of the mean. 21) Measures of Variability = when you comparing the numbers from the centre / direction and the size of the dispersion . 21) Statistics and Parameters = Graphs are limited in what they can tell us / Every time we summarize the more information we lose / Difficulty making inferences about a population when looking at a subject or sample / Therefore, we need to use numerical measures / Measures associated with the population are called Parameters /|| Measures associated with a sample are called Statistics. 22) Measures of the Centre = Mean --> mean of the value (e.g. When we ask what is the average) / most commonly used to measure of central tendency / chose the mean – the one that heals the most information (if left to chose one) / Sum of all values or observation divided by the number of observations ||| Sample: Population (always round up based on the data you given, like the numbers above) ((( n = number of observations - Mean = X ( -- with a bar on top) -- E = sigma (means you are adding up something) -- Xi = sum of all observation -- i=1 = add every observation to sigma) Measures of the Centre 2 1) Median = Value occupying the 'middle possition' in an ordered set of observations / with the median the value doesnt necessarly matter, its the value that occupies the middle possition. / Order the observations, lowest to highest, and find the middle position == Even Observation --> Formula: 0.5 (n+1) (this gives you the position) 2) Median Example th rd th – Even Observations 0.5(6+1)=3.5 position in the ordered set / So, add 3 and 4 and divide by 0.5 = 1/2 (9.1+9.4) = 1/2 (18.5)=9.25 = Median Temperature is 9.25. ||| 3) Mode = Value that occurs with the highest frequency /Allows you to locate the peak of a relative frequency histogram / Mode is 3 household members (based on the table presented) Choosing anAppropriate Measure == Mean is usually best measure as it is sensitive to change in a single observation / it is able to take the changes that can or might occur in the observation / it is also or can be the most represntative / however, it is not always a good measure, distribution by-mobil / it is not actually telling you that high peak /But not a good measure when.. / 4)Distribution is bi-modal (2 modes) 5)Skewed distributions = Instead you can Use median / It is good to use for normal graph. 6) Normal = Mean, Median and Mode will be found in the middle. 7) Bimodal = Two modes/ Mean and Median in the Middle. 8) Positively Skewed = Mode will always be to the LEFT / MEAN will always be to the RIGHT / because it is taking on-board the value of the outlier / Median will be somewhere in the middle . 9) Negatively Skewed = Mode will be on RIGHT / Mean will be on LEFT / Median will always be somewhere in the middle. ||| Measures of Dispersion 1 10) Range = Simplest Measure of Dispersion / Takes difference between smallest and largest value in the data-set, at the interval/ratio scale / But influenced by outliersFormula : Range = Xmax – Xmin (the largest value minus the smalles value) 11) Quartiles = Can yield more information and lessen impact of outliers / Data are divided into quartiles (4 groups) / divides data into four e
More Less

Related notes for GGR270H1

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.