Lecture 1

Stat 151
Jan. 9, 17
Chapter 1:
- Information we gather is called data
- Statistics is science of how to collect, summarize, analyze, present, and interpret data
o And making decisions on them
- Three main aspects of stats
o Design – how to obtain data – chapters 9-11
▪ Data must represent population
o Description – methods to describe data – chapters 2-8
o Inference – making decisions and predictions based on data – chapters 16-25
o Probability – chance of statistic being accurate – chapters 12-15
- Stats is using data to get knowledge about world around us
o Population - entire group of objects which information is collected
o Sample – part or subset of the population used to collect data
o Parameter – numerical summary of population
o Statistic – numerical summary of sample
o Inferences – statements about population based on sample
- Mean = average
- 𝜇 = average of population
Jan. 11, 17
- Read chapter 9-11
- Variables – a value that varies that describes a characteristic of person/thing
o Numbers or labels
o Distinguish the type of variable during a study
o Categorical/qualitative
▪ Nominal – none is better than another
▪ Ordinal – some are better than others in specific order
o Measurement/quantitative
▪ Discrete – whole number quantity
▪ Continuous – can have values between whole numbers
▪ *student ID is identifier variable
- Date values need context
- 5 w’s
o Who what where when why
Chapter 2:
- Chapter 2 focuses on categorizing categorical variables
- Population – everyone
- Observational study – independent variable is not controlled by researcher
- Experimental study – independent variable is controlled by researcher - Stratified samples select people from each group
- Volunteer sample – people have strong feeling for study, and therefore gives bias a study
- Lurking or confounding variables – might affect results
- Cannot make casual inference in observational study*
- Might be able to make casual inference to population only in randomized experiment
- Frequency – count of a variable
- Relative frequency – proportion of all data
Jan. 13, 17
- Bar chart – for non-measurement data
- Pareto diagram – bar chart largest to smallest
- Segmented bar chart – one on top of another
- Pie chart
- Marginal/joint/conditional distribution**
- Contingency table – one variable on row, other on column
o Can calculate relative frequencies differently when calculating from different totals
▪ Conditional distribution
Chapter 3: Displaying and summarizing quantitative data
- Average = mean = 𝑥̅ = ∑𝑥
𝑛
- Mean does not represent center of data
o Do not take out outliers
- Median is centre of data
𝑛+1
o 2 – if answer is decimal, use intermediate of the two nearby data values
- Mode is value with highest frequency
o Does not exist when everything once
o Mode does not need to be unique – can be more than one mode
Jan. 16, 17
- If there is outlier, use median. Otherwise, use mean
- Dot plot – dot on graph for every data
o Not very useful
- Stem and leaf display
o Table-like with second digit on left and multiple first digits on the right
- Histogram
o Good for summarizing lots of data
- Density – area = relative frequency
- Numbers of humps
o Uniform – 0
o Unimodal
o Bimodal
o Multimodal
- Right/left positive/negative skewed distribution - Centre – the value(s) that splits data in half
o In a skewed distribution use median
- Spread
o The range of values , concentration, or the most of values around centre – range,
standard deviation, IQR
th th
- p percentile is a number so tha

