Font Size: a A A

Research And Application Of Data Cleaning In Guizhou Local Tax Projects

Posted on:2013-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:J L WuFull Text:PDF
GTID:2248330395986448Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
Nowadays, Management Information System (MIS) is widely used in many fields with the development of computer application techniques and theories, which also accumulated large amounts of historical data. With the growth of business data, the complexity of the business to enhance the data quality issues have become increasingly prominent. When people aware of the importance of data quality problems need to be solved, the researchers will develop a framework for detection and cleaning of data quality issues and ideas. Many database vendors developed based on these frameworks and ideas of their own data cleansing tools. With the implementation of the theory and application of cleaning tools and cleaning for the enhancement of the quality of data has played a good role, which reflects the importance of data cleansing.Guizhou Provincial tax data set after the project before the need to focus on the nine city (state, region) and a province directly under the tax data cleaning has been focused on the provincial tax data also need to do the cleaning work. In the paper mainly describes the design and implementation of pre-cleaning program. In the implementation process, this paper the current status quo of market cleaning and various city (state, region), Environmental Analysis, the final choice is to write a specific application program. In order to facilitate analysis and found that data quality problems in the application process to take to develop a class diagram and the E-R model through reverse engineering and other technical data of each city (state, region). Class diagrams and E-R model development to facilitate business exchanges in the coordination of mutual aid and technical staff, the development of cleaning rules, write stored procedures work effectively guide. Established cleaning rules are divided into three categories: Error-checking rules, modify rules and backup rules. The core idea is to write specific applications, the cleaning rules are stored in the rule table, and then use the manual written in stored procedures and functions to implement the dynamic use of cleaning rules. To write specific applications in the cleaning work is a continuous process, because the production system in the daily operation of the process may add a new data quality issues, for good city (state, region), production data analysis and cleansing, the application in the latter part of the maintenance process can also be very convenient in accordance with the need to add or amend the cleaning rules.In the data cleaning process, for a variety of technical difficulties encountered, such as: how to develop the analysis of rules properly stored and scheduling, how to correct application the physical Rowid access data and how the large amount of data table processing and optimization proposed solution.
Keywords/Search Tags:Data cleaning, data quality, Analysis of rules, partition table
PDF Full Text Request
Related items