Page 268 -
P. 268
Chapter 6 Foundations of Business Intelligence: Databases and Information Management 267
maintained multiple times in a database. Your name may have been misspelled
or you used your middle initial on one occasion and not on another or the
information was initially entered onto a paper form and not scanned properly
into the system. Because of these inconsistencies, the database would treat
you as different people! We often receive redundant mail addressed to Laudon,
Lavdon, Lauden, or Landon.
If a database is properly designed and enterprise-wide data standards estab-
lished, duplicate or inconsistent data elements should be minimal. Most data
quality problems, however, such as misspelled names, transposed numbers, or
incorrect or missing codes, stem from errors during data input. The incidence
of such errors is rising as companies move their businesses to the Web and
allow customers and suppliers to enter data into their Web sites that directly
update internal systems.
Before a new database is in place, organizations need to identify and correct
their faulty data and establish better routines for editing data once their data-
base is in operation. Analysis of data quality often begins with a data quality
audit, which is a structured survey of the accuracy and level of completeness
of the data in an information system. Data quality audits can be performed by
surveying entire data files, surveying samples from data files, or surveying end
users for their perceptions of data quality.
Data cleansing, also known as data scrubbing, consists of activities for
detecting and correcting data in a database that are incorrect, incomplete,
improperly formatted, or redundant. Data cleansing not only corrects errors
but also enforces consistency among different sets of data that originated in
separate information systems. Specialized data-cleansing software is available
to automatically survey data files, correct errors in the data, and integrate the
data in a consistent company-wide format.
Data quality problems are not just business problems. They also pose
serious problems for individuals, affecting their financial condition and even
their jobs. For example, inaccurate or outdated data about consumers’ credit
histories maintained by credit bureaus can prevent creditworthy individuals
from obtaining loans or lower their chances of finding or keeping a job.
LEARNING TRACK MODULESS
The following Learning Tracks provide content relevant to topics covered in
this chapter:
1. Database Design, Normalization, and Entity-Relationship Diagramming
2. Introduction to SQL
3. Hierarchical and Network Data Models
MIS_13_Ch_06 Global.indd 267 1/17/2013 2:27:44 PM