STATS 10 Lecture Notes - Lecture 2: Data Set, Sample Size Determination, Scientific Control

68 views4 pages
10 Jun 2018
School
Department
Course
Chapter 1: Introduction to Data
Data
Data is info, measurements, or observations recorded or collected
More than just numbers→ context is everything
Does not have to be numbers at all; (e.g. images and sound files are also data; think Shazam)
Statisticians generally try to recode non-number data into numbers in order to analyze it
Data set (dataset): collection of data
Note: data=plural but can be used as single; consistency is key; same with data/dataset
Populations and Samples
Population: entire collection of objects of interests; can be people, things, or even groups of things
Usually very large and thus difficult (or impossible) to observe/collect data on directly
Sample: portion of a population of interest
Usually taken to measure certain characteristics of a population; typically have data on a sample
Sample size: number of objects of interest in the sample, usually denoted by n
Observations and Variables
Observation (observable unit): set of data collected on an object of interest
Variable: any piece of info (e.g. characteristic, number,
quantity) that can be measured or counted on an object of
interest (examples: gender, height, color, area)
Data Tables
Almost always, datasets are stored as a data table
Each observation is a row in the data table (horizontal)
Each variable is a column in the data table (vertical)
# of rows=sample size (and sample size=# of
observations)
Two Types of Variables
Numeric (or quantitative) variables describe quantities of the
object of interest; values are numbers
Tells us “how much”/”how many”; e.g. age, height, temperature, GPA, weight
Categorical (or qualitative) variables describe qualitites of the objects of interest; Values are categories
Tells us “what type” or “what kind” e.g. sex, hair color, class subject, eye color
If a categorical variable contains unique value for each individual (e.g. Name), often called unique
identifier; not unique identifier if two subjects share the same name (being indifferentiable)
WARNING: it is not always obvious if a varible is numeric or categorical just by whether values are numerical or
not; important to consider what the values represent in conext!
Some variables w/ numbercial values are categorical (e.g. area codes)
Some datasets code categorical variables as numbers (Yes=0 and No=1)
Some numeric variables can be recoded as categorical variables
age=numeric, but age range=categorical
CONEXT IS KEY!: most important aspect of data but is often overlooked
who/what are observational units? What variables measured? How and what units of measurement? Who
collected the data? Where? When? Why? How?
The relevance, strength, and reliability of a data set depends on answers to these questions
Organize It
organizing /displaying data=important in understand what our data tells us
w/ categorical variables, we’re usually interested in how often a particular category occurs in our sample
Frequency of a value= number of times the value is observed in a data set (e.g. 27)
One way to display frequency is w/ a frequency table
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows page 1 of the document.
Unlock all 4 pages and 3 million more documents.

Already have an account? Log in

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related textbook solutions

Related Documents