midterm cheatsheet.docx

2 Pages
Unlock Document

University of British Columbia
Land & Food Systems
LFS 252
Erin Friesen

Numerical – quantities //Categorical- qualities – describing them- car color,//A variable is a characteristic of people or things e.g. weight, eye color//Population is the collection of all data values//A sample is a subset of the population.-> A sample is used to get a partial, representing a portion of a big group//Coding categorical data: Categorical sometimes will contain numbers, but it still counts as categorical data.Reason- easier to input in file (into computer)e.g. How do you like the course? 1 ok 2 good 3 dislike -> Results: 1,2,3,2,3,2,1 -> still categorical//Categorical data in two- way tables : counts and frequency//Understanding causality: Designing experiments1. Observation study: uses group that are already created and recorded the difference ->We can only say the treatment and outcomes are associated instead of causation Why? Individuals from each group may not be identical (ppl eat garlic my also eat more ginger) ->Confounding variable -> may have another variable actually causing the result e.g. maybe smoking is actually causing the disease not eating garlic2.Controlled experiment -> Establishing causality: show that an outcome is affected by some treatment- >Divide into treatment group and control group) # The sample sizes have to be large enough account for the variability// # assigning should be done randomly .Placebo effect: the phenomenon of reacting after being told of receiving a treatment even if there was no actual treatment again.Blind study: is a control study where the participate do not know whether they are taking treatment. Double blind study ( Ideal, but not necessary for the experiment) : when the person collecting the data doesn’t know which treatments the participates are taking. Distribution: describe values, frequencies and shape of the data.Visualizing numerical data : 1.Dot pot Pros: show the individual data values, easy to spot outliers, describe the distribution visually.Cons: not very common and not good for data with too many individual values2.Stem and leaf plots:Stem: all digits before the last digit//Leaf: the last digit Show individual data values, but also classifies data into bins with a width of 10.3.Histograms ( horizontal: numerical data, vertical: frequency)- Group data into bins( also called intervals or classes)- The widths of bars are meaningful represent constant numbers, and have to be the same size- Cannot have gaps between bars, except when there is no data go to that bar- Only has one order//Pros: can display larger amounts of data, easy to analysis//Different width of bins display the chart differently -> Too small of the widths : too much details.Relative frequency histograms: the vertical axis represents relative frequency:But still have the same shape as the FH, just the scale of VA changed -RFH: want to see what portion of the total range( represents percentage), FH: represents quantity//Aspects of a distribution1.Shape:Skewed right: lower at right, Skewed left: lower at left, Symmetric: same amount of values in both right and left hand sides, Number of Mounds: Unimodal(1), bimodal( 2 bumps) multimodal( more than 2) Outliers:Reasons: indicate error in data No error e.g. one person is high salaried since he is owner but his employee is low waged-> even its outlier it is still the fact not error. Center:- typical value: higher number in the middle-> normal distribution ( bell shape)- non typical value: lower number in the middle -> bimodal or skewed distributions Variability: Low- numbers distribute only in certain intervals ( total 5 intervals but number mostly spread out 2 intervals//High- number spread out even. Visualizing categorical data:1.Bar charts: Like histogram, but horizontal represent categorical data//Could be different order//Have gaps between bars//The width of bars is meaningless, but the widths of each categories have to be the same2.Pie charts: Circle look, divide into pies and each pie represents a portion to the frequency of the outcome//Better display of how much of a share each category has of the whole3.Pareto charts: categories from largest to smallest ( from left to right) Mode: category occurs with the highest frequency( a thought of typical outcome)Variability: means diversity in different categories, not means many frequency.Side- by- side bar chart: the picture that one category contains two bars E.g. in reading book chart: separate readers into females and males.Misleading graphs: Frequency scale not starting at 0//Use symbols rather than bars -> confuse readers, cant tell the difference between numbers//Unequal width bars Mean: describes the centerFor skewed right histogram, the means is to the right of the typical value SD: describes the spreadLarge SD: the distribution ( bell shape) is wide and short in the center; small SD: the distribution is narrow and high in the center => N( 50,8): Means- 50, SD- 8, sample is normally distributed.Define Majority? Use Empirical Rule (approximate):1SD: 68 % of samples fell in between 1SD away from Mean,2SD : 95 %,3SD: 99 %.Z- score: standardize the observation  How many SD away from mean//The resulting units are called Standard units.Skewed distribution: samples fell in certain group of intervals, mean will be pulled to the tail of the graphic (NOT IDEAL TO DESCRIBE THE CENTER) Median will be better representation.Median: middle number of the average of the two middle numbers if the sample size is even///Symmetric distribution: medium and mean are similar,Skewed distribution: medium and mean are not similar Quartiles: used to measure spread of a skewed numeric distribution.below 1Q: 25%,below 3Q 75%, IQR( Interquartile range)=Q3-Q1 = 50%>>An outlier affect Mean, SD, and Range but not Median and IQR.Boxplots: less details (median, Q1, Q3)-> useful for comparing different distributions and potential to find outliers Potential outliers: data value that is a distance of more than 1.5 interquartile ranges ( below 1Q or above 3Q)///Boxplots show:Typical range of values,Possible Q,variation//No show :Mode: cant tell which number has more, Mean, Anything for small data sets, especially <5./// Regression analysis :Exam the relationship ( association) between two variable dataUsed scatterplots: Used to investigate a positive, negative or no association between two numerical variables//Strength of association: -strong: small spread of y values - weak: large spread of y values..///Linear trends: A trend is linear if there is a line and the data generally not stray far from the line Correlation Coefficient ( r) : how strength of the linear
More Less

Related notes for LFS 252

Log In


Don't have an account?

Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.