Font Size: a A A

Improvement Of Dameng Data Interchange Platform(DMETL)Execution Process

Posted on:2012-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:2218330362457825Subject:Computer technology
Abstract/Summary:PDF Full Text Request
ETL is the core component of Data Warehouse, extracting data from heterogeneous sources, cleaning and transforming the data, finally loading them into Data Warehouse. The good or bad work of ETL development and design directly influences the construction of Data Warehouse and the application of the whole Business Intelligence system. Therefore, it is of great significance that we make a further improvement on the indicators of Dameng Interchange Platform.Based on in-depth study and analysis on the principle and mechanism of Dameng Data Interchange Platform, and also the key technologies involved, we found out some shortcomings and deficiencies existing in current platform. So, two improved methods were put forward. One, considering the serial working way of current platform against efficacy, so we introduced the pipeline technology into the platform.In fact, pipeline technology is essentially implemented by multi-threading and caching techniques, which can control data extraction, data transformation and data loading run synchronously in three different thread instances, working as abstract as pipeline in order to reduce the latency time for the intermediate links. Through this technology, we can make a full use of CPU resources to improve the system throughput rate, and to promote the ETL work efficiency.The other one, given the common problem all the extraction ways of current platform must confront to: the unauthorized access to the customer system. Moreover, taking into account that many Oracle data source were frequently used in the actual project.Therefore, we designed and implemented one extraction way based on analyzing log file to obtain the incremental data. We analyze the log file of database obtained by LOG MINER, one product of log file analysis provided by Oracle Database, to capture the changes to database, which were recorded in log file. Analyzing the change operations to the database, we can capture the incremental data. In this way, the influence of DMETL on client system can be effectively reduced.Finally, experiments we conduct show that the introduction of pipelining can improve to some extent Dameng Data Interchange Platform more efficient; and the way based on log analysis to capture the incremental data can also avoid the intrusion to business systems. In addition, our work on the research above may play an important role to promoting the development of log analysis kits supplied by the database products.
Keywords/Search Tags:Data Warehouse, ETL, pipeline, incremental data extraction, log analysis
PDF Full Text Request
Related items