LFS 252 Cha 1 Data Learning objectives:  Suggest confounding variables prevent us from inferring causation Confounding variables: People who eat garlic may also eating higher level of food affecting the outcomes - Numerical and categorical variables 1. Data is observation that you or someone else records 2. Data is numbers- measurement e.g. weight, height 3. Data can be nonnumeric : eye color , drinking or not Numerical – quantities Categorical- qualities – describing them- car color, - A variable is a characteristic of people or things e.g. weight, eye color - Population is the collection of all data values - A sample is a subset of the population. -> A sample is used to get a partial, representing a portion of a big group - Coding categorical data: Categorical sometimes will contain numbers, but it still counts as categorical data Reason- easier to input in file (into computer) e.g. How do you like the course? 1 ok 2 good 3 dislike -> Results: 1,2,3,2,3,2,1 -> still categorical Categorical data in two- way tables : counts and frequency Understanding causality: - Designing experiments 1. Observation study: uses group that are already created and recorded the difference  We can only say the treatment and outcomes are associated instead of causation Why? Individuals from each group may not be identical (ppl eat garlic my also eat more ginger)  Confounding variable -> may have another variable actually causing the result e.g. maybe smoking is actually causing the disease not eating garlic 2. Controlled experiment  Establishing causality: show that an outcome is affected by some treatment  Divide into treatment group and control group) # The sample sizes have to be large enough account for the variability # assigning should be done randomly Placebo effect: the phenomenon of reacting after being told of receiving a treatment even if there was no actual treatment again Blind study: is a control study where the participate do not know whether they are taking treatment Double blind study ( Ideal, but not necessary for the experiment) : when the person collecting the data doesn’t know which treatments the participates are taking. Cha 2 visualizing data Why visualizing data-> more effective to summarize data  Can see patterns, variables, relationships, easy to understand  Can show the wrong data Distribution: describe values, frequencies and shape of the data Visualizing numerical data : 1. Dot pot Pros: show the individual data values, easy to spot outliers, describe the distribution visually Cons: not very common and not good for data with too many individual values 2. Stem a
