Study Guides (291,221)
CA (139,214)
Concordia (2,221)
COMM (522)
Midterm

COMM 215 Study Guide - Midterm Guide: Data Mining, Business Statistics, External ValidityPremium

20 pages86 viewsWinter 2015

Department
Commerce
Course Code
COMM 215
Professor
Fassil Nebebe
Study Guide
Midterm

This preview shows pages 1-3. to view the full 20 pages of the document.
COMM 215 Midterm
Chapter 1
Data mining: application of statistical techniques and algorithms to the analysis of large data sets
Business intelligence: application of tools/technologies for gathering, storing, retrieving, and
analyzing data that businesses collect and use
Business statistics: collection of procedures and techniques that are used to convert data into
meaningful information in a business environment
Average = 

o N= number of data values
o Xi = ith of data value
Can be divided into two categories:
1. Descriptive:
Science of describing the important aspects of a set of measurements
If population = small → we can take a census instead of sample
describing data through charts, graphs and numerical measures
2. Inferential:
Science of using a sample of measurements to make generalizations about the
important aspects of a population of measurements
Includes tools/techniques that help decision makes draw inferences
(interpretations) from set of data. Inferential procedures:
o Estimation: situations when want to know about all data in large data set
but it isn’t convenient to work with all the data → estimates are used by
looking closely at a subset of the larger data set
o Hypothesis testing: using statistical techniques to validate a claim
Experiment: process that produces a single outcome whose result cannot be predicted with
certainty
Internal validity:
o Characteristic of an experiment
o Collecting data in a way that variables (not of interest to researcher) aren’t
influencing data
o E.g.: drug that reduces cholesterol → gender, race, weight, diet are other
factors that might control cholesterol
External validity:
o Characteristic of an experiment
o Results of an experiment can be replicated for groups different from the
original population
o E.g.: Europeans vs. Canadians
You're Reading a Preview

Unlock to view full version

Subscribers Only

Only half of the first page are available for preview. Some parts have been intentionally blurred.

Subscribers Only
Bias: an effect that alters a statistical result by systematically distorting it
Population: set of or measurements obtained from all objects/individuals of interest
o Finite: fixed/limited in size
o Infinite: unlimited
Sample: a subset of a population
Census: a list of the entire set of measurements taken from the whole population
Statistical sampling techniques (probability sampling): sampling methods that use selection
techniques based on chance selection
Simple random: every item in the population has a known/calculable chance of being
selected
Stratified random: population is divided into subgroups (strata) → each population item
belongs to only one stratum and are as much alike as possible
Systematic random: selecting every kth item in the population between 1-kth . The value of
“k” = 

o E.g.: 20,000 students. 500 = sample → 
  → range = 1-40
Cluster: population is divided into groups/clusters which = mini populations
Non-statistical techniques: non chance processes
Convenience: selects item from population based on accessibility and ease of selection
Judgement
Other non-chance processes
Types of data:
Time series: values observed at consecutive points in time
Cross-sectional: values observed at fixed point in time
Data can be either:
1. Quantitative: values are numerical
o E.g.: dollars, pounds, inches, percentages
o Two types:
i. Interval: mathematical difference between two values
o E.g.: Fahrenheit vs. Celsius
o 0 degrees = cold NOT no heat → 0 does not mean absence of quantity
ii. Ratio: characteristics of interval + has a true zero point (0= “none”)
o E.g.: Weight can have a zero point → zero = no weight
2. Qualitative: measurement scale is categorical.
o E.g.: single, married, divorced, excellent, good, fair, poor
You're Reading a Preview

Unlock to view full version

Subscribers Only

Only half of the first page are available for preview. Some parts have been intentionally blurred.

Subscribers Only
o Two types:
i. Nominal: assigning codes to categories
o E.g.: marital status: 1. married, 2.single, 3. divorced, 4. Other → codes
are the numbers and have no specific meaning)
ii. Ordinal:
o Data is rank-ordered
o E.g.: indicating household income → (1) under 20,000, (2) 20,000 to
40,000, (3) over 40, 000 → numbers are assigned according to
increasing income and have meaning
Chapter 2
Frequency distribution: summary of a set of data that displays the number of observations in
each of the distribution’s distinct categories/classes
E.g.:
Number of product categories
Frequency
1
25
2
29
Total = 54
Discrete data: can take on a countable number of possible values
Relative frequency: proportion of total observation that are in a given category
o Relative frequency =
o Fi = number of times ith value appears in data set
o n= total number of observations
o
 → k = number of different values for the discrete variable
Grouped frequency distributions: classes must meet four criteria
1. Must be mutually exclusive: classes shouldn’t overlap so data value can be placed in
only one class
2. Must be all-inclusive: set of classes should contain all the possible data values
3. Must be equal width: distance between lowest value and highest in each class is equal for
all classes
4. Avoid empty classes if possible
Steps for grouping data into classes:
1. Establish number of groups/classes to use
o 5-20 classes
o 2k ≥ n rule ( n = number of data values)
E.g.: n = 230 → 2k = 28 = 256 ≥ 230 while 27 =128 < 230
K= 8 therefore number of classes = 8
You're Reading a Preview

Unlock to view full version

Subscribers Only

Loved by over 2.2 million students

Over 90% improved by at least one letter grade.