Research Of Data Cleaning Method Based On Data Warehouse

Posted on:2005-05-04

Degree:Master

Type:Thesis

Country:China

Candidate:Z F Zhou

Full Text:PDF

GTID:2168360122471117

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

In current world, the requirement of enterprise informationization is much more immient, in which an important aspect is management of enterprise data. Based upon the principle of "Garbage in, garbage out", data managed demand reliable, no mistake and truly reflecting actual enterprise situation for supporting to make right decision. Therefore management of data quality acquires increasing attention. However, data cleaning is a significant method to improve data quality.The application of data warehouse mainly represents the degree of enterprise informationization. Data warehouse is subject-oriented, compositive, non-renewed and change unceasing along with time data sets. Data warehouse is the basis of decision-making, so validity of data in data warehouse is vital for avoiding makeing wrong decision. In many caces, the data in data warehouse which derive from multi-operation data source. However, data source is likely stored on different hardware platform and use different OS. As a result, the data from these data source absolutely exist inconsistent data. The objective of data cleaning is to solve data quality issue due to the reason hereinbefore. Thus data cleaning is regarded as one of the most important prolems for creating data warehouse. One situation of data quality issue is a realistic entity being represented by several not complete same records, called approximately duplicated records. Examining and eliminating approximately duplicated records is one of main problems needed solve for data cleaning and improving data quality. The process of exploring approximately duplicated records can be intitled record matching process.On the basis of analyzing current problems existing in data cleaning, especially after abundant researching on exploring and eliminating approximately duplicated records, this paper brings forward record matching method and eliminating approximately duplicated records method based on RDBMS, expecting to eliminate approximately duplicated records in data warehouse. By doing experiments on large database, the methods that we proposed in thispaper are proved efficient in eliminating approximately duplicated records.

Keywords/Search Tags:

data warehouse, data quality, data cleaning, approximately duplicated records, record matching

PDF Full Text Request

Related items

1	The Research And Application Of Duplicated Records And Incomplete Data's Cleaning Approach
2	Research On Data Cleaning Of Approximately Duplicated Records
3	Some Main Technology's Research Of Data Cleaning
4	Research On Detection Of Approximate Duplicate Records For Massive Data
5	Similar Repetitive Record Detection Method In Uncertainty Database
6	Research And Implementation Of Data Cleaning System Based On Pre-Processing Techniques
7	Study Of Data Cleaning Algorithms Based On Data Warehouse
8	Research And Application Of Data Cleaning In The Construction Of POI Data Warehouse
9	Research On The Method Of Approximately Duplicated Records Detection For Text Data In Big Data Envitonment
10	Data Bank Data Warehouse Build Process Of Cleaning And VIP Clients Of The Excavation