Font Size: a A A

Sharing Research Based Knowledge Modeling Of The Data

Posted on:2012-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ShiFull Text:PDF
GTID:2208330335471971Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of social credit system, it becomes more and more urgent for sharing of the credit information to transact enterprise business, handle individual economic business, deal with financial businesses and carry on government affairs. However, enterprise credit information is distributed in the bank, commerce, tax, securities regulation, customs, public security, justice, finance, auditing, quality supervision, environmental protection and other government departments according to the survey. Today, it has become a very important issue that government departments share these distributed enterprises credit information integrated and aggregated by credit center. It is largely a matter of sharing of the credit information resources due to fact that government and enterprise departments achieve integration and interconnection of credit information and implement the reuse and exchange of credit data. In the paper, for the above problem, data cleaning, data matching and knowledge modeling are researched for implement data sharing among different departments. The work is follow:Firstly, a flexible, extensible and dynamic data-sharing model is proposed. The model contains three parts:data cleaning model and data matching model and data sharing engine. At first, data sharing engine notices data cleaning model to obtain user's dynamic cleaning knowledge and information; and then informs data matching model to get user's dynamic matching knowledge and information; Finally, it is responsible for collecting the cleaning and matching knowledge and information dynamically generated above, and then executes the cleaning and matching algorithm by using them, and ultimately generates credit sharing data.Secondly, considering the limited extensibility, flexibility and efficiency of current data cleaning tools, this paper proposes a data cleaning model based on knowledge to solve this question. The model contains three parts:rule learning model and data learning model and duplicate learning model. Rule learning model completes the specific correspondence between dynamic rule and data quality issues, which generates rule information. Data learning model achieves initializing of dynamic data information, which produces initialization data information. Duplicate learning model implement the correspondence between clustering of dynamic records and feedback of category, which forms the best category information, and then carry out window sorting learning to running duplication.Cleaning engine executes cleaning algorithm by loading rule information and data information, and then run duplicating algorithm by loading category information. Application of model improves the extensibility, flexibility and efficiency of current data cleaning tools and ensures the quality of dynamic data. A flexible and extensible and general data cleaning tool is made by using the model in theory and practice.Thirdly, cleaning approximately duplicate records (CADR) is a core and important issue in data cleaning domain, but how to implement valid and practical CADR is still a research difficulty. Based on those, this paper proposed a clustering feedback pattern specification (CFPS) to verify the validity of CADR. First, the concept of cluster pattern and feedback pattern and its algorithms were given based on the analysis of function-to-function relation of the subclass category clustered. And then CFPS was proposed in data cleaning domain. Finally, an example resulted in the process of credit data exchange system was given to test the validity of CFPS by using clustering feedback pattern specification.Fourthly, the framework of data matching based on knowledge model is given. The framework contains three parts:matching rule learning and matching data learning and data matching engine. Matching rule learning model completes the specific correspondence between matching rules arid matching problem, which generates dynamic matching rule information. Matching data learning model achieves initializing of the dynamic standard data, which produces standard data information. Data matching engine executes consistent matching algorithms of identity and data by loading matching rule information and standard data information. Finally, the model performance of time and quality was verified by experiment.At last, the algorithm of data cleaning and data alignment is put forward. Data cleaning algorithm consists of attribute cleaning algorithm(ACA)and record duplicate algorithm(RDA).ACA implements detecting, eliminating and repairing of dirt data; RDA realizes recognizing, merging and deleting of duplicate records. Data alignment algorithm includes standard data alignment algorithm (SDAA)and non-standard data alignment algorithm(NSDAA).SDAA implements inputting and matching of standard department data, which produces the standard information of data alignment, according to benchmark matching rules; NSDAA completes matching and entering of non-standard department data, which finally generates shared data of standard and non-standard department, according to standard information and matching rules of data alignment.
Keywords/Search Tags:dynamic rule, data quality, knowledge modeling, rule learning, data cleaning, data comparison
PDF Full Text Request
Related items