ITEC 4230 Final: ITEC 4230 Final Notes

226 views11 pages

Document Summary

Data mining: extraction of interesting (non-trivial, implicit, previously unknown & potentially useful) patterns of knowledge from huge amounts of data. Knowledge discovery process (kdd: learning application domain. Relevant prior knowledge & goals of application: creating target data set, data cleaning & preprocessing, data reduction & transformation. Find useful features, dimensionality/variable reduction, invariant representation: choosing functions. Summarization, classification, regression, association, clustering: choosing mining algorithim, data mining, pattern evaluation & knowledge presentation. Data mining functionalities: characterization/generalization, association & correlation analysis, classification, Cluster analysis, outlier analysis, sequential pattern/trend, structure & network analysis. Classification schemes: different views lead to different classifications. Data view: kinds of data to be mined. Knowledge view: kinds of knowledge to be discovered. Application view: kinds of applications adapted: general functionality. Classification: given a collection of records (training set) Each record contains a set of attributes, one of the attributes is the class: find a model for class attribute as a function of the values of other attributes, goals.