Font Size: a A A

Research And Implementation Of An Incremental Data Detection Method Based On Snapshot Differential Technology

Posted on:2011-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:C T LiFull Text:PDF
GTID:2178360305462467Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
One of the key issues in data integration is the detection of incremental data from a specific information source in an efficient and timely manner. Various approaches have been proposed to tackle this problem. This thesis proposes an incremental data detection method which is based on snapshot differential technology. The method does not rely on the implementation mechanism of the information sources, and therefore demonstrates better universality and adaptability.First, the thesis presents a formal description of the snapshot differential problem, before giving a brief review of two traditional snapshot differential algorithms in Sort Merge and Partition Hash. By analyzing their cost, it identifies space for their further improvement. The fingerprint-based snapshot differential algorithm for incremental data detection is then presented. The idea of message digest is first introduced, which aims to improve the efficiency, followed by a theoretical analysis on limitations and correctness of the proposed algorithm. The algorithm first uses the MD5 algorithm to obtain the fingerprint of each record in the snapshot, and then compares the fingerprints of corresponding records instead of comparing all fields of corresponding records. This can significantly reduce the amount of comparison and Input/Output when compared with the traditional algorithms, as demonstrated later in the thesis. This algorithm and the two mentioned above have been implemented as a snapshot differential module in a data integration system, whose data structure, module structure and processing flow are described in detail in this thesis. Some experimental results on several test datasets are then presented. This is followed by a quantitative comparison among these three algorithms in Sort Merge, Partition Hash and the fingerprint-based algorithm, as well as the detection method using the set differential operation provided by DBMS to perform incremental data detection, based on which a discussion about their efficiency and applicability is given. Finally, the incremental data detection method based on snapshot differential technology implemented in this thesis has been successfully deployed in the Telecoms Company Business Information Integration Support Platform to extract the incremental data in an automatic manner.
Keywords/Search Tags:Data Integration, Incremental Data Detection, Snapshot Differential, MD5
PDF Full Text Request
Related items