To deal with growth and the diverse nature of digital data, organizations must employ sophisticated
techniques for information management. Information technologies and systems support organizations in
managing – that is, acquiring, organizing, storing, accessing, analyzing, and interpreting – data. The first
requisite to providing sound information that is useful to managers and staff is that data need to be of
high quality, meaning, that should be accurate, complete, timely, consistent, accessible, relevant, and
concise. Unfortunately, however, the process of acquiring, keeping, and managing data is becoming
increasingly difficult. Let’s see the reasons why.
The difficulties of managing data – firstly, the amount of data increases exponentially over time.
A lot of historical data must kept for a long time.
In addition, data are scattered throughout organizations and collected by many individuals using
various methods and devices.
Another problem is that data come from multiple sources: internal sources, personal sources,
and external sources. Data also comes from the web, in the form of click stream data.
Clickstream data are data that visitors and customers produce when they visit a website and
click on hyperlinks.
Another issue with data is that it degrades over time. Data are also subject to data rot. Data rot
refers primarily to problems with the media on which the data are stored. Over time,
temperature, humidity, and exposure to light can cause physical problems with storage media
and thus make it difficult to access data. The second aspect of data rot is that finding the
machines needed to access the data can be difficult.
Another problem with managing data is data errors – information that is out of date, inaccurate,
or technically corrupt
Data security, quality, ad integrity are critical, yet easily jeopardized and make the process of
managing data more difficult. In addition, legal requirements relating to data differ among
countries and industries and they change periodically
The Data Lifecycle – businesses run on data that have been transformed into information and
knowledge. Managers then apply this knowledge to business problems and opportunities.
Businesses transform data into knowledge and solutions in several ways. The general process is
referred to as the data lifecycle. It shows how organizations process and manage data to make
decisions, generate knowledge, and use them in a variety of applications. It starts with collection
of data in various sources. These sources are either: internal data, external data, and personal
The data are stored in one or more data bases. Selected data from the organizations databases
are then processed to fit the format of a data warehouse or data mart. Users then access the
data in the warehouse or data mart fro analysis. Then, the analysis is performed with data-
analysis tools, which look for patterns, and with intelligent systems, which support data fro
interpretation. These activities ultimately generate knowledge that can be used to support
THE DATA BASE APPROACH
A database management system (DBMS) is a set of programs that provides users with tools to add,
delete, access, and analyze data stored in one location. DBMS’s also provide the mechanisms for
maintaining the integrity of stored data, managing security and user access, and recovering information if
the system fails.
Using databases eliminates many problems that arose that arose from previous methods of
storing and accessing data.
In general, database management systems contribute to minimize the following problems:
o Data Redundancy – the same data are stored in many places 2
o Data Isolation – applications cannot access data associated with other applications
o Data Inconsistency – various copies of the data do not agree
In addition, database systems maximize the following issues:
o Data Security – because data are essential to organization, databases have extremely
high security measures in place to deter mistakes and attacks
o Data Integrity – data meet certain constraints, such as no alphabetic characters in a
social insurance number field.
o Data Independence – applications and data are independent of one another
The Data Hierarchy – data in databases are arranged in a hierarchy in order to make them more
understandable and useful. A bit (binary digit) represents the smallest unit of data a computer
can process. The term binary means that a bit can consist only of a 0 and a 1. A group of eight
bits, is called a byte, represents a single character. A byte can be a letter a number, or a symbol.
A logical grouping of characters into a word, a small group of words, or an identification number
is called a field. A logical grouping of related fields, such as a student’s name, the courses taken,
the date, and the grade, compose a record. A logical grouping of related records is called a file or
table. A logical grouping of related tables would constitute a database.
Designing the Database – to be valuable, a database must be organized so that users can
retrieve, analyze and understand the data they need. A key to designing an effective database is
the data model. A data model is a diagram that represents entities in the database and their
relationships. An entity (previously known as record) as a person, place, thing, or event – such as
a customer, en employee, or a product – about which information is maintained. Entities can
typically be identified in the users work environment. A record generally describes an entity.
Each characteristic or quality of a particular entity (previously called a field) is called an attribute
in the context of data modeling.
Every record in a table must contain at least one attribute/field that uniquely identifies that
record so that it can be retrieved, updated, and sorted. This identifier is called a primary key.
Secondary keys are other fields that have some identifying information but typically do not
identify the record or entity with complete accuracy.
Entity Relationship Model – designers plan and develop the database through a process called
entity-relationship (ER) modeling, using an entity-relationship diagram. Users are likely to be
asked to review ER diagram to make sure it includes all the data they need in order to obtain the
information they need to perform their job. ER diagrams consist of entities, attributes, and
relationships. Entities are pictured in boxes while relationships are pictured in diamonds. The
attributes for each entity are listed next to the entity, and the primary key is underlined.
As defined earlier, an entity is something that can be identified in the users’ work environment.
Entities of a given type are grouped in entity classes. An instance of an entity class is the
representation of one particular entity. Therefore a particular STUDENT is an instance of the
student entity class; a particular parking permit is an instance of the parking permit entity class.
Entity instances have identifiers, which is another name for the primary key, attributes (or fields)
that are unique to that entity instance.
Entities are associated with one another in relationships, which can include many entities. The
number of entities in a relationship is the degree of the relationship. Relationships between two
items are called binary relationships. There are three types of binary relationships; one-to-one,
one-to-many, and many-to-many.
In one to one relationships, a singly entity instance of one type is related to a single entity
instance of another type.
The second type of relationship is one to many. This means that a professor can have many
classes, but each class can only have one professor.
The third type of relationship is many to many. This relationship means that a student can have
many classes, and a class can have many students. 3
RELATIONAL DATABASE MANAGEMENT SYSTEMS
Relational Database Model – based on the concepts if two dimensional tables. A relational
database generally is not one big table – usually called a flat file – that contains all of the records
and attributes. Such a design would entail a far too much data redundancy. Instead, a relational
database is usually designed with a number of related tables. Each of these tables contains
entities (as records listed in rows) and attributes (as fields listed in columns). These related tables
can be joined when they contain common columns. The uniqueness of the primary key tells the
DBMS which records are joined with others in related tables.
Because large-scale databases can be composed of many interrelated tables, the overall design
can be complex and therefore have slow search and access times.
Query Languages – requesting information from a database is the most commonly performed
operation. Structured Query Language (SQL) is the most popular query language used to request
information. SQL allows people to perform complicated searches by using relatively simple
statements or key words. Typical key words are SELECT (to specify a desired attribute), FROM (to
specify the table to be used), and WHERE (to specify conditions to apply in the query).
Another way to find information in a database is to use Query by Example (QBE). In QBE, the user
fills out a grid or template (known as a form) to construct a sample or description of the data he
or she wants.
Data Dictionary – when a relational model is created, the data dictionary defines the format
necessary to enter the data into the database. The data dictionary provides information on each
attribute, such as its name, whether it is a key or part of a key, the type of data expected, and
valid values. Data dictionaries can also provide information on how often the attribute should be
updated, why it is needed in the database, and which business functions, applications, forms,
and reports use the attribute.
Normalization – in order to sue a relational database management system effectively, the data
must be analyzed to eliminate redundant data elements. Normalization is a method for analyzing
and reducing a relational database to its most streamlined form for minimum redundancy,
maximum data integrity, and best processing performance. When data are normalized, attributes
in the table depend only on the primary key.
Today, the most successful organizations are those that can respond quickly and flexibly to market
changes and opportunities. As we know, a key to this response is the effective and efficient use of data
and information by managers and employees. Two reasons organizations are building data warehouses.
First, databases have