Final Review Module 3 Course Pack Definitions and Characteristics
Text Unstructured data and an asset that can be managed.
Text Mining Consists of powerful software tools to discover and extract knowledge from text documents.
Assets Resources with recognized value that are under the control of an individual or organization.
Database Stores enterprise data that a company’s business applications create or generate, such as sales,
accounting, and employee data.
Data Warehouse Specialized type of database that aggregates data from transaction databases so it can be
Data Management A structured approach for capturing, storing, processing, integrating, distributing,
securing, and archiving data effectively throughout its life cycle.
Three general data principles illustrate the importance of the data life cycle perspective and guide IT
1. Principle of diminishing value Value of data diminishes as the data ages, the more recent the data, the
more valuable it is.
2. Principle of 90/90 data use A majority of stored data, as high as 90 percent, is seldom accessed after 90
3. Principle of data in context End users need to see data in a meaningful format and context if the data is
to guide their decisions and plans.
Data Visualization Presenting data in ways that are faster and easier for users to understand.
Enterprise Portals A set of software applications that consolidate, manage, analyze, and transmit data to
users through Webbased interface.
Master Data Management A process whereby companies integrate data from various sources or
enterprise applications to provide a more unified view of the data.
Data Entity Anything real or abstract about which a company wants to collect and store data.
Master Data Entities The main entities of a company, such as customers, products, suppliers, employees,
Data Mart A small data warehouse designed for a strategic business unit or a single department.
Extract, Transform, Load A series of processes that transform data into knowledge.
Data Quality A measure of the data’s usefulness as well as the quality of the decisions based on the data.
The process of performing analysis on text to discover insights is similar to analyzing traditional data types.
1. Exploration Word counts, creating topics
2. Preprocessing Misspelled words, abbreviations
3. Categorizing and Modeling Building a decision tree, neural network, etc.
Business Records Documents that record business dealings such as contracts, research and development,
accounting source documents, memos, customer/client communications, and meeting minutes. Document Management The automated control of imaged and electronic documents, page images,
spreadsheets, voice and email messages, wordprocessing documents, and other documents through their
life cycle in an organization, from initial creation to final archiving or destruction.
Document Management Systems Consist of hardware and software that manage and archive electronic
documents and also convert paper document into edocuments and then index and store them according to
Green Computing An initiative to conserve our valuable natural resources by reducing the effects of our
computer usage on the environment.
File Management Systems
Bit Represents the smallest unit of data a computer can process.
Byte A group of 8 bits, represents a single character that can be a letter, a number, or a symbol.
Field Characters that are combined to form a word, a group of words, or a complete number.
Record Related fields, such as vendor name, address, and account data.
File A collection of related records.
Primary Key An attribute (field) that uniquely identifies a record.
Secondary Key Nonunique fields that have some identifying information.
Foreign Key Purpose is to link two or more tables together.
Sequential File Organization Data records must be retrieved in the same physical sequence in which
they are stored.
Direct File Organization Records can be accessed directly regardless of their location on the storage
Indexed Sequential Access Method Uses an index of key fields to locate individual records.
Problems with traditional file organization:
• Data Redundancy The presence of duplication data in multiple data files so that the same data
are stored in more than one place or location.
• Data Inconsistency The same attribute may have different values.
• Data Isolation, Lack of Data Sharing Availability Information cannot flow freely across
different functional ar