Information Technology
ITEC 4230
Xiaofeng Zhou

4230 Final Notes Lecture 1: Data Mining o Extraction of interesting (nontrivial, implicit, previously unknown potentially useful) patterns of knowledge from huge amounts of data Knowledge Discovery Process (KDD) o Learning application domain Relevant prior knowledge goals of application o Creating target data set o Data cleaning preprocessing o Data reduction transformation Find useful features, dimensionalityvariable reduction, invariant representation o Choosing functions Summarization, classification, regression, association, clustering o Choosing mining algorithim o Data mining o Pattern evaluation knowledge presentation Visualization, transformation, removing redundant patterns Data Mining Functionalities o CharacterizationGeneralization, Association correlation analysis, Classification, Cluster analysis, outlier analysis, sequential patterntrend, structure network analysis Classification Schemes o Different views lead to different classifications Data view: Kinds of data to be mined Knowledge view: Kinds of knowledge to be discovered Method view: Kinds of techniques utilized Application view: Kinds of applications adapted o General functionality Descriptive data mining Predictive data mining Classification o Given a collection of records (training set) Each record contains a set of attributes, one of the attributes is the class o Find a model for class attribute as a function of the values of other attributes o Goals Previously unseen records should be assigned a class as accurately as possible A test set is used to determine the accuracy of the model. A given data set is divided into training and test sets, where training set is used to build the model and test set is used to validate it o Example
