Chapter 4 – Data and Knowledge
Information is being collected at an alarming rate today. Data is being collected through emails,
webpages, credit card swipes, phone messages, stock trades, memos, address books, and
Information technologies help to acquire, organize, store, access, analyze, and interpret data.
These data become information and then knowledge. There are also multiple problems in data
management. Database management systems solve this problem for managers and data
warehouses have become important because they provide the data managers need to make
4.1 MANAGING DATA – data should be of high quality, meaning it should be accurate,
complete, timely, consistent, accessible, relevant, and concise.
The Difficulties of Managing Data – the amount of data increases overtime and historical data
needs to be kept for long periods of time. New data is added rapidly. Data is collected through
various means and stored on servers, in different formats and languages and their security is
The different sources of data are internal sources, personal sources, and external sources it
also comes in the form of Clickstream data which is through the use of internet. Data decays
over time as people’s information changes and new products are made etc.
The Data Life Cycle – the data businesses use has already been processed into information
and knowledge. Data starts to get collected from different sources. From there it fits the format
of a data warehouse. Data is then analyzed and results are produced which are solutions to
problems for which data was collected in the first place i.e. in the form of general support and
4.2 THE DATABASE APPROACH – the collection of data and the database management
system fixes many problems such as:
• data redundancy – the same data stored in many places
• data isolation – applications not being able to access data with other applications
• data inconsistency – various copies of the data do not agree
Database systems also increase security and data integrity. The data is also independent and is
not linked with one another once it goes through the database management systems.
The Data Hierarchy – Data are organized in a hierarchy form. The bit is the smallest form of
data. Eight bits make a byte and a byte can form one character. Any logical number of groupings of characters is called a field. A database with more than one field is called a record.
Groupings of a records is called a file or table. A collection of files would make a database.
Designing the Database – data must be understood and analyzed easily. A data model
defines entities or records. It is a diagram that represents entities in the database and their
relationships. An entity is a person, place, thing, or event. Each characteristic of a particular
entity is called an attribute. For e.g. a customer name, employee number are attributes. Any
field that can help uniquely identify a record is called a primary key, for e.g. a SSN. Secondary
keys help identify records too but their accuracy is lesser than the primary ones.
• Entity relationship modeling – the databases are planned through a process called
entity-relationship (ER) Modeling, using an entity-relationship diagram which consists
of entities, attributes, and relationships. Entities are shown in boxes and relationships
are shown in diamonds. The number of entities in a relationships is the degree of the
relationships. Relationships between two items are called binary relationships. These
are one-to-one, one-to-many, and many-to-many. (1:1) relationship, for e.g. a parking
permit for a student. A (1:M) relationship is a professor to his students. A (M:M)
relationship is for e.g. a student having many classes and classes having many
students. Entity-relationships help us to represent all relationships between parties.
DATABASE MANAGEMENT SYSTEMS – A database management system (DBMS) allows
users to manipulate data stored in one location. DBMS also allows to manage security for the
data. We focus on the relational database model since it is simple and popular.
The Relation Database Model – the data is organized in columns and rows and retrieved by
finding the intersection of the two. Each of these tables contains records (listed in rows) and
attributes (listed in columns). It is based on the concept of two-dimensional tables also called a
flat-file. Large scale databases may have many interrelatable tables and have slow access
• Query Languages – requesting data from a database is the most common performed
operation. Structured query language (SQL) is the most popular query language. SQL
works in a way that it finds data from commands like SELECT, FROM and WHERE to
specify a location. This can be used to retriever multiple data as well, for e.g. the names
of all students graduating in 2014. Query by example (QBE) is another way to find
information. One constructs a sample description of the required data.
• Data dictionary – the data dictionary defines the format necessary to enter the data into
the database. They provide names and standard definitions for all attributes, they reduce
the chance that a same attribute will be used in different applications under a different
• Normalization – normalization is a process to reduce database redundancy. The
attributes depend on the primary key when data are normalized.
Databases in Action – all organizations have one or more databases. 4.4 DATABASE WAREHOUSING – successful companies always respond quickly to ch