Study Guides (400,000)
CA (150,000)
York (10,000)
ITEC (100)
Study Guide

ITEC 4230- Midterm Exam Guide - Comprehensive Notes for the exam ( 13 pages long!)


Department
Information Technology
Course Code
ITEC 4230
Professor
Xiaofeng Zhou
Study Guide
Midterm

This preview shows pages 1-3. to view the full 13 pages of the document.
York
ITEC 4230
MIDTERM EXAM
STUDY GUIDE

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

4230 Final Notes
Lecture 1:
Data Mining
o Extraction of interesting (non-trivial, implicit, previously unknown & potentially useful)
patterns of knowledge from huge amounts of data
Knowledge Discovery Process (KDD)
o Learning application domain
Relevant prior knowledge & goals of application
o Creating target data set
o Data cleaning & preprocessing
o Data reduction & transformation
Find useful features, dimensionality/variable reduction, invariant representation
o Choosing functions
Summarization, classification, regression, association, clustering
o Choosing mining algorithim
o Data mining
o Pattern evaluation & knowledge presentation
Visualization, transformation, removing redundant patterns
Data Mining Functionalities
o Characterization/Generalization, Association & correlation analysis, Classification,
Cluster analysis, outlier analysis, sequential pattern/trend, structure & network analysis
Classification Schemes
o Different views lead to different classifications
Data view: Kinds of data to be mined
Knowledge view: Kinds of knowledge to be discovered
Method view: Kinds of techniques utilized
Application view: Kinds of applications adapted
o General functionality
Descriptive data mining
Predictive data mining
Classification
o Given a collection of records (training set)
Each record contains a set of attributes, one of the attributes is the class
o Find a model for class attribute as a function of the values of other attributes
o Goals
Previously unseen records should be assigned a class as accurately as possible
A test set is used to determine the accuracy of the model. A given data
set is divided into training and test sets, where training set is used to
build the model and test set is used to validate it
o Example
find more resources at oneclass.com
find more resources at oneclass.com
You're Reading a Preview

Unlock to view full version

Only pages 1-3 are available for preview. Some parts have been intentionally blurred.

Goal: reduce cost of mailing by targeting set of consumers likely to buy new cell
phone
Approach
Use data for similar product introduced before
We ko hih ustoes deided to uy ad hih did’t. This {uy,
do’t uy} deisio fos the lass attiute
Collect various demographic, lifestyle and company interaction related
info about all the customers
o Type of business, where they stay, income etc.
Use info as input attributes to learn a classifier model
Clustering
o Given a set of data points, each having a set of attributes, and a similarity measure
among them, find clusters such that
Data points in one cluster are more similar to one another
Data points in separate clusters are less similar to one another
o Similarity measures
Euclidean distance if attributes continuous
Intracluster distances minimized
Intercluster distances maximized
Other problem specific measures
o Example
Market segmentation
Goal
o Subdivide a market into distinct subsets of customers where any
subset may conceivably be selected as a market target to be
reached with a distinct marketing mix
Approach
o Collect different attributes of customers based on their
geographical and lifestyle related information
o Find clusters of similar customers
o Measure clustering quality by observing buying patterns of
customers in same cluster vs those from different clusters
o Document clustering
How many words are common in certain documents (after some word filtering)
Association Rule Discovery
o Given a set of records each of which contain some # of items from a given collection
Produce dependency rules which will predict occurrence of an item based on
occurrence of other items
find more resources at oneclass.com
find more resources at oneclass.com
You're Reading a Preview

Unlock to view full version