Font Size: a A A

Research On Mulit-Source Data Cleaning And Application In The Financial Department Budget

Posted on:2010-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:M GaoFull Text:PDF
GTID:2178330338985614Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
At present the Financial Department everywhere are carrying out large-scale system constructing exploration of the Jin Cai project works in order to translate the financial data scattered in various islands into unified data integration. So that we can achieve maximum integration of financial resources. In this process we require the use of appropriate data cleaning strategy for financial budgeting and decision-making to provide accurate, consistent and complete information.The paper first outlines the Jin Cai project works and the data warehousing technology, and then analyzes the problem of data quality, leading to the relevance of the principle of data cleaning and focuses on the strategy of different data cleansing for different data quality. Subsequently, this paper describes the multi-source data cleansing applications in building a financial budget data platform, and elaborates the ETL application model basing on the data cleaning module. Through the discussion of the overall architecture about the data platform application, we can provide a highly efficient data platform for Budget data warehouse. And then contrary to the characteristics of financial budget data, we improve algorithm of the duplicated records cleaning. One is improve the calculation of records matching. It uses the relative text combined with the threshold to calculate txt similarity, what can solve part of spelling errors and most the identification between the old and new identity card number. Second, we execute SNM by two times using the different keywords to improve detection of duplicate records. Third, we use the variable window, to improve the cleaning efficiency. And also describe how the vacancy value, single status bar, non-standard field, the no trustable data can be dialed by ETL tools in the data cleaning algorithm module. At the same time, summed up the implementation of data cleaning in ETL workflowAt last the paper using partly data in personnel management information base and financial integration wages database to test ETL tool for data cleaning module. And the paper tests the performance of improved SNM by the comparison of duplicate records through the cleaning. Integrated test results that the data cleaning module of ETL tool can using different data cleaning strategies for different data quality about budge data platform in the financial sector. It can solve the data sources problem about policy analysis of financial supply and adjustment of budget policy for the financial sector.
Keywords/Search Tags:the Problem of Data Quality, Data Cleaning, Finance department budget
PDF Full Text Request
Related items