Font Size: a A A

Data Cleaning And Its Application In The Planning Value System Of Baosteel Group Corporation

Posted on:2006-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:W Q FuFull Text:PDF
GTID:2168360152987308Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the speeding up of the information-based progress in enterprises, the management of the enterprises' working data is getting more and more difficult. According to the rule of "garbage in, garbage out", in order to provide the support for the decision-maker, the data of management must be accurate and represent the real status of the enterprise actually, so more and more people begin to pay attention to the management of enterprise data. This paper mainly dealt with the management of enterprise data from a data cleaning perspective.The Planning Value System implemented in Baosteel Group Corporation depends on the analysis of the foundation data including the history data and the real-time data, the cleaning up of garbage data and the actual estimated value (the Planning Value's estimated value). Based on analyzing the reason of the difference between the actual value and the estimated value, it can find out the direction to improve the management and help to implement the management cycle. Therefore, the Planning Value management provides an effective means for various managements, a valid method to raise the efficiencies of various foundation managements and a good platform to implement exception management.Because the foundation data have so many problems in accuracy which will affect the normal function of the Planning Value System, it is necessary to use data cleaning techniques to resolve the problem of data before it put into use.This paper had a detailed analysis and research on the technique of data cleaning and its application in the Planning Value System of Baosteel Group Corporation and was divided into the following four parts.The first part just gave us a summarization of thetechnique of data cleaning. It explained the origin and definition of data cleaning. Then it gave us a complete introduction to the data quality problems which we could meet with and the common steps of data cleaning. In the end, an algorithm of clustering, based on the N- Gram, was explained in detail.The second part discussed the requirement analysis and part of related development design of the Planning Value System in Baosteel Group Corporation. It discussed the difficulties, the key techniques and the characteristics of inner dirty data in the system. It also explained the superiority of related function support in the SAS software whose data warehousing system had a specific mechanism to support the checking of the outer data and the integrating of various source data. The SAS software is a good helper to the work of data cleaning which required by the Planning Value System in Baosteel Group Corporation.The third part expounded a series of work which using the SAS software to implement the operation of data cleaning. Then, based on the Euclidean distance, it realized an algorithm of clustering. The result of the work proved that this algorithm could meet the basic requirements of the Planning Value System implemented in Baosteel Group Corporation, and could well resolve the problem of data duplicate, which badly existed in the foundation data.In the end, it put forward several directions on data cleaning worthy of our research, including the easy operability, the validity, the compatibility and the generalization of data cleaning.
Keywords/Search Tags:data cleaning, data warehousing, planning value, clustering
PDF Full Text Request
Related items