Get 1 week of unlimited access
Study Guides (331,022)
CA (147,170)
Concordia (2,393)
COMM (465)
Midterm

# COMM 215 Study Guide - Midterm Guide: Data Mining, Business Statistics, External ValidityPremium

20 pages104 viewsWinter 2015

Department
Commerce
Course Code
COMM 215
Professor
Fassil Nebebe
Study Guide
Midterm

This preview shows pages 1-3. to view the full 20 pages of the document.
COMM 215 Midterm
Chapter 1
Data mining: application of statistical techniques and algorithms to the analysis of large data sets
Business intelligence: application of tools/technologies for gathering, storing, retrieving, and
analyzing data that businesses collect and use
Business statistics: collection of procedures and techniques that are used to convert data into
meaningful information in a business environment
Average = 

o N= number of data values
o Xi = ith of data value
Can be divided into two categories:
1. Descriptive:
Science of describing the important aspects of a set of measurements
If population = small → we can take a census instead of sample
describing data through charts, graphs and numerical measures
2. Inferential:
Science of using a sample of measurements to make generalizations about the
important aspects of a population of measurements
Includes tools/techniques that help decision makes draw inferences
(interpretations) from set of data. Inferential procedures:
o Estimation: situations when want to know about all data in large data set
but it isn’t convenient to work with all the data → estimates are used by
looking closely at a subset of the larger data set
o Hypothesis testing: using statistical techniques to validate a claim
Experiment: process that produces a single outcome whose result cannot be predicted with
certainty
Internal validity:
o Characteristic of an experiment
o Collecting data in a way that variables (not of interest to researcher) aren’t
influencing data
o E.g.: drug that reduces cholesterol → gender, race, weight, diet are other
factors that might control cholesterol
External validity:
o Characteristic of an experiment
o Results of an experiment can be replicated for groups different from the
original population

Unlock to view full version

Subscribers Only

Only half of the first page are available for preview. Some parts have been intentionally blurred.

Subscribers Only
Bias: an effect that alters a statistical result by systematically distorting it
Population: set of or measurements obtained from all objects/individuals of interest
o Finite: fixed/limited in size
o Infinite: unlimited
Sample: a subset of a population
Census: a list of the entire set of measurements taken from the whole population
Statistical sampling techniques (probability sampling): sampling methods that use selection
techniques based on chance selection
Simple random: every item in the population has a known/calculable chance of being
selected
Stratified random: population is divided into subgroups (strata) → each population item
belongs to only one stratum and are as much alike as possible
Systematic random: selecting every kth item in the population between 1-kth . The value of
“k” = 

o E.g.: 20,000 students. 500 = sample → 
  → range = 1-40
Cluster: population is divided into groups/clusters which = mini populations
Non-statistical techniques: non chance processes
Convenience: selects item from population based on accessibility and ease of selection
Judgement
Other non-chance processes
Types of data:
Time series: values observed at consecutive points in time
Cross-sectional: values observed at fixed point in time
Data can be either:
1. Quantitative: values are numerical
o E.g.: dollars, pounds, inches, percentages
o Two types:
i. Interval: mathematical difference between two values
o E.g.: Fahrenheit vs. Celsius
o 0 degrees = cold NOT no heat → 0 does not mean absence of quantity
ii. Ratio: characteristics of interval + has a true zero point (0= “none”)
o E.g.: Weight can have a zero point → zero = no weight
2. Qualitative: measurement scale is categorical.
o E.g.: single, married, divorced, excellent, good, fair, poor

Unlock to view full version

Subscribers Only

Only half of the first page are available for preview. Some parts have been intentionally blurred.

Subscribers Only
o Two types:
i. Nominal: assigning codes to categories
o E.g.: marital status: 1. married, 2.single, 3. divorced, 4. Other → codes
are the numbers and have no specific meaning)
ii. Ordinal:
o Data is rank-ordered
o E.g.: indicating household income → (1) under 20,000, (2) 20,000 to
40,000, (3) over 40, 000 → numbers are assigned according to
increasing income and have meaning
Chapter 2
Frequency distribution: summary of a set of data that displays the number of observations in
each of the distribution’s distinct categories/classes
E.g.:
Number of product categories
Frequency
1
25
2
29
Total = 54
Discrete data: can take on a countable number of possible values
Relative frequency: proportion of total observation that are in a given category
o Relative frequency =
o Fi = number of times ith value appears in data set
o n= total number of observations
o
 → k = number of different values for the discrete variable
Grouped frequency distributions: classes must meet four criteria
1. Must be mutually exclusive: classes shouldn’t overlap so data value can be placed in
only one class
2. Must be all-inclusive: set of classes should contain all the possible data values
3. Must be equal width: distance between lowest value and highest in each class is equal for
all classes
4. Avoid empty classes if possible
Steps for grouping data into classes:
1. Establish number of groups/classes to use
o 5-20 classes
o 2k ≥ n rule ( n = number of data values)
E.g.: n = 230 → 2k = 28 = 256 ≥ 230 while 27 =128 < 230
K= 8 therefore number of classes = 8