# COMM 215 Study Guide - Midterm Guide: Data Mining, Business Statistics, External ValidityPremium

20 pages104 viewsWinter 2015

School

Concordia UniversityDepartment

CommerceCourse Code

COMM 215Professor

Fassil NebebeStudy Guide

MidtermThis

**preview**shows pages 1-3. to view the full**20 pages of the document.**COMM 215 Midterm

Chapter 1

Data mining: application of statistical techniques and algorithms to the analysis of large data sets

Business intelligence: application of tools/technologies for gathering, storing, retrieving, and

analyzing data that businesses collect and use

Business statistics: collection of procedures and techniques that are used to convert data into

meaningful information in a business environment

Average =

o N= number of data values

o Xi = ith of data value

Can be divided into two categories:

1. Descriptive:

• Science of describing the important aspects of a set of measurements

• If population = small → we can take a census instead of sample

• describing data through charts, graphs and numerical measures

2. Inferential:

• Science of using a sample of measurements to make generalizations about the

important aspects of a population of measurements

• Includes tools/techniques that help decision makes draw inferences

(interpretations) from set of data. Inferential procedures:

o Estimation: situations when want to know about all data in large data set

but it isn’t convenient to work with all the data → estimates are used by

looking closely at a subset of the larger data set

o Hypothesis testing: using statistical techniques to validate a claim

Experiment: process that produces a single outcome whose result cannot be predicted with

certainty

• Internal validity:

o Characteristic of an experiment

o Collecting data in a way that variables (not of interest to researcher) aren’t

influencing data

o E.g.: drug that reduces cholesterol → gender, race, weight, diet are other

factors that might control cholesterol

• External validity:

o Characteristic of an experiment

o Results of an experiment can be replicated for groups different from the

original population

o E.g.: Europeans vs. Canadians

###### You're Reading a Preview

Unlock to view full version

Subscribers Only

Only half of the first page are available for preview. Some parts have been intentionally blurred.

Subscribers Only

Bias: an effect that alters a statistical result by systematically distorting it

Population: set of or measurements obtained from all objects/individuals of interest

o Finite: fixed/limited in size

o Infinite: unlimited

Sample: a subset of a population

Census: a list of the entire set of measurements taken from the whole population

Statistical sampling techniques (probability sampling): sampling methods that use selection

techniques based on chance selection

• Simple random: every item in the population has a known/calculable chance of being

selected

• Stratified random: population is divided into subgroups (strata) → each population item

belongs to only one stratum and are as much alike as possible

• Systematic random: selecting every kth item in the population between 1-kth . The value of

“k” =

o E.g.: 20,000 students. 500 = sample →

→ range = 1-40

• Cluster: population is divided into groups/clusters which = mini populations

Non-statistical techniques: non chance processes

• Convenience: selects item from population based on accessibility and ease of selection

• Judgement

• Other non-chance processes

Types of data:

• Time series: values observed at consecutive points in time

• Cross-sectional: values observed at fixed point in time

Data can be either:

1. Quantitative: values are numerical

o E.g.: dollars, pounds, inches, percentages

o Two types:

i. Interval: mathematical difference between two values

o E.g.: Fahrenheit vs. Celsius

o 0 degrees = cold NOT no heat → 0 does not mean absence of quantity

ii. Ratio: characteristics of interval + has a true zero point (0= “none”)

o E.g.: Weight can have a zero point → zero = no weight

2. Qualitative: measurement scale is categorical.

o E.g.: single, married, divorced, excellent, good, fair, poor

###### You're Reading a Preview

Unlock to view full version

Subscribers Only

Only half of the first page are available for preview. Some parts have been intentionally blurred.

Subscribers Only

o Two types:

i. Nominal: assigning codes to categories

o E.g.: marital status: 1. married, 2.single, 3. divorced, 4. Other → codes

are the numbers and have no specific meaning)

ii. Ordinal:

o Data is rank-ordered

o E.g.: indicating household income → (1) under 20,000, (2) 20,000 to

40,000, (3) over 40, 000 → numbers are assigned according to

increasing income and have meaning

Chapter 2

Frequency distribution: summary of a set of data that displays the number of observations in

each of the distribution’s distinct categories/classes

E.g.:

Number of product categories

Frequency

1

25

2

29

Total = 54

Discrete data: can take on a countable number of possible values

Relative frequency: proportion of total observation that are in a given category

o Relative frequency =

o Fi = number of times ith value appears in data set

o n= total number of observations

o

→ k = number of different values for the discrete variable

Grouped frequency distributions: classes must meet four criteria

1. Must be mutually exclusive: classes shouldn’t overlap so data value can be placed in

only one class

2. Must be all-inclusive: set of classes should contain all the possible data values

3. Must be equal width: distance between lowest value and highest in each class is equal for

all classes

4. Avoid empty classes if possible

Steps for grouping data into classes:

1. Establish number of groups/classes to use

o 5-20 classes

o 2k ≥ n rule ( n = number of data values)

▪ E.g.: n = 230 → 2k = 28 = 256 ≥ 230 while 27 =128 < 230

▪ K= 8 therefore number of classes = 8

###### You're Reading a Preview

Unlock to view full version

Subscribers Only

#### Loved by over 2.2 million students

Over 90% improved by at least one letter grade.