STATS 10 Lecture Notes - Lecture 1: Simple Random Sample, Sampling Frame, Cancer Staging
Introduction: What is Statistics Anyway? An Overview of the Statistical Perspective
● Statistics is everywhere; arises in various disciplines and has everyday applications
Variation and Data: two major concepts underlying statistics
● Inherent element of randomness (chance) in each situation; same scenario can have different outcomes
○ → natural variation or variability
■ If the same scenario always has the same outcome, scenario is deterministic
● predictions/forecasts/conclusions all based on data → the info/measurements/observations that is
recorded/collected
○ Data often doesn’t give us complete picture of situation, but stats gives us a way of making conclusions
while accounting for uncertainty
What even is Statistics?
● Statistics is:
○ The study of collecting, analyzing, and making conclusions from data
○ Tool to understanding what data can/cannot tell us about world
○ Systematic framework for quantifying uncertainty
● Power of stats in applications, not mathematical foundations
Overview of the Statistical process
1. Start w/ research question (e.g. Does this new medication help fight lung cancer?)
2. Determine relevant population i.e. target group of people or things of interest; (e.g. population=people w/ lunc
cancer)
3. (Usually) the population is too difficult ot observe directly, so we take a sample, a portion of the population, and
observe/study the sample
a. E.g. sample=select group of lung cancer
patients
4. Statistical inference: based on what we see in
sample, we can make conclusions aout the population
a. E.g. conclusion that medication is helpful (it
slows lung cancer progression by certain rate)
Representative Samples
● For valid conclusions, sample must be representative
of the population; if not, sample is biased
○ → e.g. biased sample of lung cancer patients:
all in same cancer stage, gender, or all diabetic; does not accurately represent population as a whole
Simple Random Sampling
● In general very difficult to get a representative sample
● One way (not guaranteed) is simple random sampling (SRS)
○ Start w/ sampling frame: list of everyone/everything in population
○ w/ sampling frame, select person/thing @ random one by one
● Properties of SRS:
○ Every person/thing in population has equal chance of being selected
○ Every possible sample has equal chance of being selected
○ Bc random chance, some samples selected may not be representative of the population
Sampling is hard
● Sampling frame is often v difficult to obtain
● In this course, assume we are able to find sample on which we can collect data (info)
● Whether we can assume the sample is representative will depend on context
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
Statistics is everywhere; arises in various disciplines and has everyday applications. Variation and data: two major concepts underlying statistics. Inherent element of randomness (chance) in each situation; same scenario can have different outcomes. If the same scenario always has the same outcome, scenario is deterministic. Predictions/forecasts/conclusions all based on data the info/measurements/observations that is recorded/collected. Data often doesn"t give us complete picture of situation, but stats gives us a way of making conclusions while accounting for uncertainty. The study of collecting, analyzing, and making conclusions from data. Tool to understanding what data can/cannot tell us about world. Power of stats in applications, not mathematical foundations. For valid conclusions, sample must be representative of the population; if not, sample is biased. E. g. biased sample of lung cancer patients: all in same cancer stage, gender, or all diabetic; does not accurately represent population as a whole. In general very difficult to get a representative sample.