CCT226H5 Study Guide - Fall 2018, Comprehensive Midterm Notes - Frequency Distribution, Histogram, Chart
CCT226H5
MIDTERM EXAM
STUDY GUIDE
Fall 2018
CCT226 Data Analysis I
Week 1
Overview
• Data analysis and statistics
o Data
o Data sources
o Descriptive statistics
o Statistical inference
• Introduction
o Living in the age of technology has implications for everyone entering the business
world
▪ Technology makes it possible to collect huge amounts of data
▪ Technology has given more people the power and responsibility to analyze data
and make decisions
o A large amount of data already exists and will only increase in the future
o Oe of the hottest topis i today’s usiess world is business analytics
▪ This term encompasses all of the types of analysis discussed in this course
▪ It also typically implies the analysis of very large data sets: Big Data Analytics
o By using quantitative methods to uncover the information in these data sets and then
acting on this information, companies are able to gain a competitive advantage
• Data and Data Sets
o Data are the facts and figures collected, summarized, analyzed, and interpreted
o The data collected in a particular study are referred to as the data set
• Elements, variables, and observations
o Elements are entities on which data are collected
▪ a.k.a. idiiduals
o variable is a characteristic of interest for the elements
o the set of measurements collected for a particular element is called an observation
o the total number of data values in a complete data set is the number of elements
multiplied by the number of variables
• summary and more about variables
o collect information – data – from individuals
o individuals can be people, animals, plants, or any object of interest
o a variable is any characteristic of an individual and varies among individuals
▪ ex. Height, age, ethnicity, languages
o distribution of a variable tells us what values the variable takes and how often it takes
these values
o Two types of variables:
▪ Either quantitative
• Something that takes numerical values for which arithmetic operations
such as adding and averaging make sense
find more resources at oneclass.com
find more resources at oneclass.com
o Ex. Height, age, blood cholesterol level, number of credit cards
owned
• Also called measurement variable
• Scales of measurement: interval/ratio
▪ or categorical
• something that falls into one of several categories
o what can be counted is the count or the proportion of
individuals in each category
o ex. Blood types, hair colour, ethnicity, whether you paid income
tax last year or not)
• categorical variables
o binary: most basic categorical data; only 2 possible values
(yes/no, accept/reject, male/female, o/1)
o nominal: extension of binary to more than 2 categories but
categories are unordered – they are named (marital status, eye
colour, industry sector)
o ordinal: extension of binary to more than 2 categories but
categories are ordered (point scale – better, same, worse-,
rankings, level of education)
• can be numerical or non-numerical
• Data sources
o Statistical studies
▪ in experimental studies, the variable of interest is first identified
▪ one or more other variables are identified and controlled so that data can be
obtained about how they influence the variable of interest
▪ in observational/non-experimental studies no attempt is made to control or
influence the variables of interest
• survey is a good example
• Data acquisition considerations
o Time requirement
▪ Search for information can be time consuming
▪ Information may no longer be useful by the time it is available
o Cost of acquisition
▪ Organizations often charge for information even when it is not their primary
business activity
o Data errors
▪ Using any data that happen to be available or were acquired with little care can
lead to misleading information
• Descriptive statistics
o Are the tabular, graphical, and numerical methods used to summarize and present data
▪ Ex. Hudson Auto Repair
• Examining costs of parts based on customer invoices
• Tabular summary: frequency and percent frequency
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
Overview: data analysis and statistics, data, data sources, descriptive statistics, statistical inference. Hudson auto repair: examining costs of parts based on customer invoices, tabular summary: frequency and percent frequency, ex. Average cost of parts is unknown: 2) sample of 50 engine tune-ups examined, 3) sample data provide a sample average parts cost of /tune-up, 4) sample average is used to estimate the population average. Descriptive statistics: understanding numerical (quantitative) and categorical (qualitative) data types in a dataset, ex. Categorical: gender (even if encoded with binary numbers; discrete data; only 2 options so still qualitative), number of children (same reason discrete), opinion (good, okay, bad, etc. It is one of the statistical functions to count the number of cells that meet a criterion: vlookup function. It is one of the lookup and reference functions, when you need to find things in a table or a range by row.