Font Size: a A A

Data Bryte: A standards/model-based data cleansing framework

Posted on:2001-10-22Degree:D.C.SType:Dissertation
University:Colorado Technical UniversityCandidate:Mohan, Steven DouglasFull Text:PDF
GTID:1468390014958287Subject:Computer Science
Abstract/Summary:
Modern corporations amass valuable “oceans” of data daily. Within the telecommunications industry, for example, companies often accrue more than one half a Terabyte (1,000,000,000,000) a day in just one of the hundreds of databases that serve both residential and commercial markets. Typically these data come from multiple sources, both internal and external to the corporation. In order to avoid being drowned during analysis in this sea of data, the data are migrated, aggregated and summarized into data warehouses.; Unfortunately, the bulk of the data contain a significant number of errors. The data residing in the operational transaction databases, though perhaps of sufficient quality for transaction processing, contain a number of duplications, inconsistencies, errors, missing data, and other issues that make them unsuitable for a decision support data warehouse, without considerable “cleansing” of the data. Data cleansing is not considered a “sexy” proposition, yet this scrubbing consumes 60–80% of the effort required in building a functional and effective data warehouse.; A database was constructed utilizing proprietary telecommunications domain data. The cleanliness was verified through visual inspection and computerized searching of the existing data. The database was then corrupted in accordance with existing error patterns found within the telecommunications data. The error patterns were either matched within one percent of existing error patterns or they were matched exactly. Again the errors were verified through visual inspection and computerized searching of the existing data.; Data Bryte, a proposed new (standards/model-based) cleansing methodology that offers a coherent and focused approach for attaining improved information quality, was constructed and tested. Generally, it was found that the more strongly the methodology provided context for error searching, the more strongly Data Bryte acted as a superior error cleansing mechanism. Specific conclusions and recommendations are presented, along with a small number of areas where further investigation could be conducted in depth.
Keywords/Search Tags:Data, Cleansing
Related items