Study Guides
(238,252)

Canada
(115,037)

University of British Columbia
(3,633)

LFS 252
(2)

Erin Friesen
(2)

Midterm

# midterm cheatsheet.docx

Unlock Document

University of British Columbia

Land & Food Systems

LFS 252

Erin Friesen

Winter

Description

Numerical – quantities //Categorical- qualities – describing them- car color,//A variable is a characteristic of people or things e.g. weight, eye
color//Population is the collection of all data values//A sample is a subset of the population.-> A sample is used to get a partial, representing a portion
of a big group//Coding categorical data: Categorical sometimes will contain numbers, but it still counts as categorical data.Reason- easier to input in
file (into computer)e.g. How do you like the course? 1 ok 2 good 3 dislike -> Results: 1,2,3,2,3,2,1 -> still categorical//Categorical data in two- way
tables : counts and frequency//Understanding causality: Designing experiments1. Observation study: uses group that are already created and recorded
the difference ->We can only say the treatment and outcomes are associated instead of causation Why? Individuals from each group may not be
identical (ppl eat garlic my also eat more ginger) ->Confounding variable -> may have another variable actually causing the result e.g. maybe smoking
is actually causing the disease not eating garlic2.Controlled experiment -> Establishing causality: show that an outcome is affected by some treatment-
>Divide into treatment group and control group) # The sample sizes have to be large enough account for the variability// # assigning should be done
randomly .Placebo effect: the phenomenon of reacting after being told of receiving a treatment even if there was no actual treatment again.Blind study:
is a control study where the participate do not know whether they are taking treatment. Double blind study ( Ideal, but not necessary for the
experiment) : when the person collecting the data doesn’t know which treatments the participates are taking. Distribution: describe values, frequencies
and shape of the data.Visualizing numerical data : 1.Dot pot Pros: show the individual data values, easy to spot outliers, describe the distribution
visually.Cons: not very common and not good for data with too many individual values2.Stem and leaf plots:Stem: all digits before the last digit//Leaf:
the last digit Show individual data values, but also classifies data into bins with a width of 10.3.Histograms ( horizontal: numerical data, vertical:
frequency)- Group data into bins( also called intervals or classes)- The widths of bars are meaningful represent constant numbers, and have to be the
same size- Cannot have gaps between bars, except when there is no data go to that bar- Only has one order//Pros: can display larger amounts of data,
easy to analysis//Different width of bins display the chart differently -> Too small of the widths : too much details.Relative frequency histograms: the
vertical axis represents relative frequency:But still have the same shape as the FH, just the scale of VA changed -RFH: want to see what portion of the
total range( represents percentage), FH: represents quantity//Aspects of a distribution1.Shape:Skewed right: lower at right, Skewed left: lower at left,
Symmetric: same amount of values in both right and left hand sides, Number of Mounds: Unimodal(1), bimodal( 2 bumps) multimodal( more than 2)
Outliers:Reasons: indicate error in data No error e.g. one person is high salaried since he is owner but his employee is low waged-> even its outlier it
is still the fact not error. Center:- typical value: higher number in the middle-> normal distribution ( bell shape)- non typical value: lower number in the
middle -> bimodal or skewed distributions Variability: Low- numbers distribute only in certain intervals ( total 5 intervals but number mostly spread out 2
intervals//High- number spread out even. Visualizing categorical data:1.Bar charts: Like histogram, but horizontal represent categorical data//Could
be different order//Have gaps between bars//The width of bars is meaningless, but the widths of each categories have to be the same2.Pie charts:
Circle look, divide into pies and each pie represents a portion to the frequency of the outcome//Better display of how much of a share each category has
of the whole3.Pareto charts: categories from largest to smallest ( from left to right) Mode: category occurs with the highest frequency( a thought of
typical outcome)Variability: means diversity in different categories, not means many frequency.Side- by- side bar chart: the picture that one category
contains two bars E.g. in reading book chart: separate readers into females and males.Misleading graphs: Frequency scale not starting at 0//Use
symbols rather than bars -> confuse readers, cant tell the difference between numbers//Unequal width bars Mean: describes the centerFor skewed
right histogram, the means is to the right of the typical value SD: describes the spreadLarge SD: the distribution ( bell shape) is wide and short in the
center; small SD: the distribution is narrow and high in the center => N( 50,8): Means- 50, SD- 8, sample is normally distributed.Define Majority? Use
Empirical Rule (approximate):1SD: 68 % of samples fell in between 1SD away from Mean,2SD : 95 %,3SD: 99 %.Z- score: standardize the
observation How many SD away from mean//The resulting units are called Standard units.Skewed distribution: samples fell in certain group of
intervals, mean will be pulled to the tail of the graphic (NOT IDEAL TO DESCRIBE THE CENTER) Median will be better representation.Median:
middle number of the average of the two middle numbers if the sample size is even///Symmetric distribution: medium and mean are similar,Skewed
distribution: medium and mean are not similar Quartiles: used to measure spread of a skewed numeric distribution.below 1Q: 25%,below 3Q 75%,
IQR( Interquartile range)=Q3-Q1 = 50%>>An outlier affect Mean, SD, and Range but not Median and IQR.Boxplots: less details (median, Q1, Q3)->
useful for comparing different distributions and potential to find outliers Potential outliers: data value that is a distance of more than 1.5 interquartile
ranges ( below 1Q or above 3Q)///Boxplots show:Typical range of values,Possible Q,variation//No show :Mode: cant tell which number has more, Mean,
Anything for small data sets, especially <5./// Regression analysis :Exam the relationship ( association) between two variable dataUsed scatterplots:
Used to investigate a positive, negative or no association between two numerical variables//Strength of association: -strong: small spread of y values -
weak: large spread of y values..///Linear trends: A trend is linear if there is a line and the data generally not stray far from the line Correlation
Coefficient ( r) : how strength of the linear

More
Less
Related notes for LFS 252