Font Size: a A A

Research And Implementation Of Data Cleansing Based On Clustering Algorithm

Posted on:2009-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2178360242486967Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, managers more and more depend on data in decision analysis. Resource data, which includes many errors, is loaded and refreshed into data warehouse. So it is essential to clean the resource data before its going into the data warehouse. This paper firstly introduces data cleaning's basic knowledge, significance and current research and application both home and abroad. The theories, methods, evaluating standards and basic workflow of data cleaning is summarized and described. On the base of it, the algorithm which is named DBSCAN and based on clustering is improved and used in the fill of missing data when data is being cleaned, then the corresponding experiment is conducted on the public measurement data set, finally the result is compared with those of traditional experiments, and the compare proves an improved accuracy. At last, improved algorithm and experiments for cleaning repeated records in data warehouse are introduced.
Keywords/Search Tags:data cleaning, clustering algorithm, missing data filling, repeated records elimination
PDF Full Text Request
Related items