Textbook Notes (369,099)
Canada (162,378)
Commerce (1,696)
Chapter 2

Statistics - Chapter 2,3,5 Notes.docx

10 Pages

Course Code
Fouzia Baki

This preview shows pages 1,2 and half of page 3. Sign up to view the full 10 pages of the document.
CHAPTER TWO: DATA What Are Data?  Transactional data: data collected from recording company’s transactions  Business analytics: any use of statistical analysis to drive business decisions from data  Data helps businesses predict what customers may buy in future  Help them know mow much of each item to stock  Need to provide context for data values  Must answer Who, What, When, Where, Why, How  Data table: o Usually the rows answer who o Column headers answer what  Respondents: individuals who answer survey  Subjects/participants: people experimented on  Experimental units: companies, websites, inanimate subjects that are experimented on  Variables: characteristics recorded about each individual or case  Relational database: two or more separate data tables that are linked together so info can be merged across them  Analysis performed on single data table in statistics Variable Types  Categorical variable: o When variable names categories and answers questions about how cases fall into those categories  Quantitative variable: o When variable has measured numerical values and tells us about the quantity of what is measured o Some have units (kg, cm, etc.) and some do not  Classifying variable helps us decide what to do with it  Usually all word variables are categorical  Not all numerical variables are quantitative  Categorical data can be numbers (ex. area codes) Counts  Counting is a way to summarize a categorical variable  Does not mean that every time something has been counted the variable is categorical  Also use counts to measure amounts  Measured quantities are associated with quantitative variables Identifiers  Identifier variable: exactly as many categories as there are things being categorized  Make it possible to combine data from different sources, protect confidentiality, provide unique labels  Ex. customer number, product ID, social insurance number  Do not need to analyze identifiers because will give not give any info  Variables can play different roles depending on question asked of them Other Variable Types  Nominal variables: categorical variables used only to name categories  Ordinal values: values that can be ordered (to pick out first, last, middle value)  Interval scale: measurements for which ratios are not useful (ex. temperature)  Ratio scale: measurements for which ratios are appropriate Cross-Sectional and Time Series Data  Time series data: measuring the same variable at different intervals over time (ex. moths, quarters, years)  Cross-sectional data: several variables measured at the same point Primary and Secondary Data  Primary: data we collect ourselves (taking survey)  Secondary: data collected by someone else Where, How and When  Have to know Who, What and Why to analyze data  If possible want to know When and Where as well  How the data are collected matters  Ex. data from voluntary survey on internet are useless because only people interested in topic will fill out questionnaire  Know three W’s before analyzing data: o Why you are examining the data (what you want to know) o Whom each row of data table refers to o What the variables (the columns of the table) record CHAPTER THREE: SURVEYS AND SAMPLING Three Features of Sampling Feature 1: Examine a Part of the Whole  Select a sample from the population to examine  Small sample can represent the entire population  Sample survey: designed to ask questions to a small group of people in the hope of learning something about the entire population  Have to make sure the sample represents the population fairly  Biased: the summary characteristics of a sample differ from the corresponding characteristics of the population it is trying to represent Feature 2: Randomize  Randomizing gives us a representative sample even for effects we were unsure of  Must make sure that the sample looks like the rest of the population  Two things make randomization fair: 1. Nobody can guess the outcome before it happens 2. Underlying set of outcomes will be equally likely  Randomize because hard to include every possible relevant characteristic of the population into the sample (income level, age, political affiliation, marital status, number of children, place of residence)  Can see how well randomizing works by taking two random samples from population  Data will end up to be almost the same  Sampling variability: the sample-to-sample differences Feature 3: The Sample Size Is What Matters  Size of the sample determines what can be concluded from the data  Size of population does not matter (ex. do not need a specific percentage or fraction)  The fraction of the population does not matter – the sample size does  Need a large enough sample so it is representative of population  Most common polling method used by professionals is the telephone A Census – Does It Make Sense?  Census: an attempt to collect data on the entire population of interest  Reasons why census may not provide best info: o Difficult to complete census (some people are hard to locate) o Population being studied may change (babies born, people travel, people die in time it takes to complete census) o Takes a lot of effort Populations and Parameters  Can use models to represent reality  Models of data can give us summaries we can learn from  Parameters: key numbers in models  Population parameter: parameter used in a model for a population  Statistic: anything calculated from a sample  Sample statistic: statistics matched with the parameters they estimate  Representative sample: sample form which the statistics computed accurately reflect the corresponding population parameters Simple Random Sampling (SRS)  Simple random sample: sample in which each set of n individuals in the population has an equal chance of selection  Sampling frame: list of individuals from which the sample will be drawn  Easiest way to choose SRS is with random numbers  Assign sequential number to each individual in sampling frame Other Random Sample Designs Stratified Sampling  Stratified random sampling: population is divided into several homogeneous subpopulations (strata) and random samples are drawn from each stratum  Need to make sure proportions of each group within sample match the proportions of those groups in the population  Ex. if population is 60% women and 40% men, choose sample that is 60% women and 40% men  Can also stratify by age, race, income, etc.  Biggest benefit of stratified random sampling is it results in reduced sampling variability  Can find info about individual strata as well as whole population  MUST take samples from every stratum in population  Cannot discard strata because they seem small (will have a large impact in the end) Cluster Sampling  Cluster sampling: representative subset of a population chosen for reasons of convenience, cost, practicality  Split population into clusters and select a few clusters at random to census  Will generate an unbiased sample if each cluster fairly represents population  Stratified vs. cluster sampling: 1. Stratify to ensure sample represents different groups in population and reduce sample-to-sample variability Cluster to save money or make study practical 2. Strata are homogeneous but differ from one another Clusters are mostly alike (Each are heterogeneous and resemble the overall population) Systematic Sampling  Systematic sample: sample drawn by selecting individuals systematically from a sampling frame  Ex. select every 10 employee on an alphabetical list  Start with randomly selected person to ensure it is still random  Can give representative sample if order of list is not associated with responses measured  Much less expensive than SRS  Must be careful that sampling frequency is not related to something about the process being sampled Multistage Sampling  Multistage sampling: sampling schemes that combine several sampling methods  Might be most useful for sampling large amounts of data Practicalities 1. Who o Have to be able to define the group you want to survey o Might have an idea but not know exactly the right people to choose o Make sure to choose people who’s answers will be meaningful 2. Sampling frame o Must specify the sampling frame o Sampling frame limits what survey can find out 3. Target sample o The individuals for whom you intend to measure responses o Nonresponse is a problem in most surveys 4. Sample o Actual respondents o Individuals you get data from and can make conclusions o Might not be representative of sampling frame/population The Valid Survey  Must ask four questions before creating a valid survey: 1) What do I want to know? o Need to know what you want to learn and from who o Do not want to ask unnecessary questions o Longer survey means less people will answer 2) Who are the appropriate respondents? o
More Less
Unlock Document

Only pages 1,2 and half of page 3 are available for preview. Some parts have been intentionally blurred.

Unlock Document
You're Reading a Preview

Unlock to view full version

Unlock Document

Log In


Join OneClass

Access over 10 million pages of study
documents for 1.3 million courses.

Sign up

Join to view


By registering, I agree to the Terms and Privacy Policies
Already have an account?
Just a few more details

So we can recommend you notes for your school.

Reset Password

Please enter below the email address you registered with and we will send you a link to reset your password.

Add your courses

Get notes from the top students in your class.