COMMERCE 2KA3 Lecture Notes - Lecture 35: Structure Mining, Sentiment Analysis, Unstructured Data
Document Summary
Characteristics of high quality information: relevant, accurate, complete, and timely. Data redundancy: presence of duplicated data in multiple data files so that the same data stored in more than one place or location. Leads to data inconsistency (when same attribute may have different values) Database: a collection of data organized to serve many applications efficiently by centralizing the data and controlling redundant data. Database management system (dbms): software that permits an organization to centralize data, manage them efficiently, and provide access to the stored data by application programs. Advantages: reduce redundancy and inconsistency, increase accessibility and availability of information, reduce program development and maintenance costs, and increase data security. Relational dbms is the most popular type of dbms. Represent data as two dimensional tables (called relations) Each table contains data on an entity and its attributes. Referential integrity: rules to ensure that relationships between coupled tables remain consistent.