SLE111 Lecture Notes - Lecture 1: Clothing Sizes, Statistic, Five Ws
Introduction into Data & Statistics Lecture Notes
The 5 Ws
What is/are statistics?
- A way of reasoning, a collection of tools and methods, designed to help us
understand the world
- Statistics (plural) are particular calculations made from data
- Statistics are about variation
- All measurements are imprecise, since there is variation (natural) that we cannot see
or anticipate
- Analysis and conclusions made need to incorporate the variability of results
- Statistics helps us to understand the real world in which we live
Terminology
Population: a collection of objects whose properties are to be analysed
o e.g. all students attending Deakin
Sample: a subset, or part, of a population – should be representative of the
population so valid inferences can be made
o e.g. a random selection of Deakin students
Variable: a characteristic about an element of a population or sample
o e.g. the age of a student
Data: the value of a variable
o e.g. 18 years, female
Experiment: a planned activity that results in responses (data) to compare
Parameter: a numerical characteristic that describes a population
o e.g. average age of all students
Statistic: a numerical characteristic that describes a sample
o e.g. average age of a sample of Deakin students
Data types
Qualitative or Categorical Variables:
o variables that have values that fall into distinct categories
o Values are text that tells us the category the particular case falls into. For
example:
▪ Gender (male/female)
▪ Employment status (full time, part time, casual or unemployed)
o Sometimes called nominal variables
o Numerical labels can be used to denote categorical variables
o For example:
▪ Employment status (1, 2, 3, 4)
▪ Where 1 = Full time, 2 = Part time, 3 = casual & 4 = unemployed
Quantitative Variables
o Variables have numerical values with units
o Two forms:
▪ Discrete:
• The difference between two adjacent values is constant
o “hoe size , , , , , , …, ;
o Dess size 8, , , …;
find more resources at oneclass.com
find more resources at oneclass.com
o Populatio , , , , …
▪ Continuous:
• Infinite range of variables within a defined range
o Length, weight, time
Variables
Identifier variable: a categorical variable that is used to uniquely identify the
individual. It does not describe the individual.
o Student ID
o Customer Number
o Transaction Number
o Tax file Number
Ordinal Variable: a variable that reports order without natural units
o Four-point Likert scale:
▪ Strongly disagree, Disagree, Agree, Strongly Agree
o Final grade: HD, D, C, P, N
o Although categorical, it can be treated as quantitative by using the rank
number
▪ 1 = Strongly Disagree, 2 = Disagree, 3 = Agree, 4 = Strongly Agree
The Five Ws
- To poide otet of a stud epeiet, e eed the Ws ad t Ho of the data
o Who – describe the individuals who were surveyed
o What – determine what is being measured
o When – when was the research conducted?
o Where – where was the research conducted?
o Why – what was the purpose of the survey or experiment?
o How – describe how the survey or experiment was conducted
▪ Note: sometimes it is not possible to determine all of these
Who
- The who of the data tells us the individual cases about which (or whom) we have
collected data
o Individuals who answer a survey are called respondents
o People on whom we experiment are called subjects or participants
o Animals, plants, and inanimate subjects are often called experimental units
- Sometimes people who just refer to the data values as observations and are not
clear about the who
o But we need to know the who of the data so we can learn what the data say
What
- The what means the variables or characteristics recorded about each individual
- The variables should have names that identify what has been measured and
recorded
- Units should be included for quantitative variables
- The variables can be of the different types seen earlier
- Categorical (or qualitative) variable:
find more resources at oneclass.com
find more resources at oneclass.com
o Names categories and
o Answers questions about how cases fall into these categories
▪ Categorical example:
• Gender, ethnicity, favourite footy team, postcode
o Some variables act as identifier variables
▪ e.g. student ID
- Quantitative variable
o A measured variable with units
o Answers questions about the quantity of what is measured
▪ Quantitative examples:
• Income ($)
• Height (cm)
• Weight (kg)
Counts
- The counts of the number of cases in each category of a categorical variable
o Summarise the data
o Are not the data
- For example: category variable – postal method
o What types of postal method?
▪ Postal method categories – Airmail/Express/Standard
o The individuals counted are the who
Where, When and How
- We need the Who, What, and Why to analyse data
- But, the more we know, the more we understand
- When and Where give us some nice information about the context
- How the data are collected can make the difference between insight and nonsense
- The fist step of a data aalsis should e to eaie ad oside the Ws ad
How
Cautions
- The Ws ad How are not always provided
- Do not label a variable as categorical or quantitative without thinking about the
question you want it to answer
- Just eause a aiales alues ae ues does ot ea that it is uatitatie
o E.g. student ID
- Always be sceptical – dot take data o esults at fae alue
Displaying Data
Data
- Can sometimes be number, names or other labels
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
A way of reasoning, a collection of tools and methods, designed to help us understand the world. Statistics (plural) are particular calculations made from data. All measurements are imprecise, since there is variation (natural) that we cannot see or anticipate. Analysis and conclusions made need to incorporate the variability of results. Statistics helps us to understand the real world in which we live. Population: a collection of objects whose properties are to be analysed: e. g. all students attending deakin. Sample: a subset, or part, of a population should be representative of the population so valid inferences can be made: e. g. a random selection of deakin students. Variable: a characteristic about an element of a population or sample: e. g. the age of a student. Data: the value of a variable: e. g. 18 years, female. Experiment: a planned activity that results in responses (data) to compare. Parameter: a numerical characteristic that describes a population: e. g. average age of all students.