Font Size: a A A

Research And Design On Data Extraction In Multiple Data Sources

Posted on:2014-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y K JiaFull Text:PDF
GTID:2268330425966822Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the application of information management systems, obtaining information fromthe data of heterogeneous multi-data sources by data mining has become increasingly difficult.The premise of data mining is to integrate data from heterogeneous multi-data sources to datawarehouse, and the ETL (Extract-Transformation-Loading) processes did it. Then in the ETLprocess, data’s extraction is the initial stage of the ETL process, therefore improved efficiencyof the data’s extraction became the important work building data warehouse.The paper learned many extracted technology about incremental data, and analyzed theiradvantages and disadvantages, then proposed the entire table contract based on the databasetransactions log files in the heterogeneous environment, named the L-C data’s extractionmethod. And the paper established model about the L-C data’s extraction method afterresearching in detail the recording process of database transaction log, and analyzing thereliability of the database transaction log, and making a detailed analysis of the entire tablecontract combined with the MD5checksum. In the theory, make an comparison of timecomplexity with other extraction methods and in the practice, make a design, achieving andanalysis to the L-C data’s extraction method in the existing system.the result in the theory and the practice shows the model is more efficient and stable. Itachieved data’s extraction in the heterogeneous environment, improved the efficiency andperformance. And it provided a solid foundation for data mining in the data warehouse.
Keywords/Search Tags:Data warehouse, Data Extraction, Transaction Log, Entire Table Contract
PDF Full Text Request
Related items