Font Size: a A A

Research Of Scalable Etl Technology And Design Of ETL Tools

Posted on:2011-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:J M HuangFull Text:PDF
GTID:2178360308463594Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The Data Warehouse is playing an important role in the enterprise information construction, ETL process is a key part in the data warehouse project. ETL tools are designed for extracting original data from the heterogeneous data sources, cleaning and transforming, to form clean and customized data, and then loading the data into the target data warehouse.This paper focuses on the ETL workflow engine and the scalability of the ETL tools, designs and implements an ETL tool prototype, the main work content is as follows:(1).Research on the ETL workflow engine. This paper proposed an algorithm to determine the execution priority of the activities in the ETL workflow. If there exists several activities that share the same priority, put them in the parallel execution environment, which could improve the execution efficiency of the ETL workflow. This paper proposed an algorithm to identify the activities that share the same priority. The result of experiment shows that the acceleration ratio of the parallel algorithm and the serial algorithm could be approaching the ideal value, as long as the data records involved is large enough.(2).Research on the scalability of the ETL tool. In order to meet the different requirements of different enterprises, ETL tools should be scalable. This paper takes the DLL technology to achieve this target, and guarantees the completeness of the metadata.(3).Design and implement the prototype of the ETL tool. The ETL tool contains three modules, the workflow engine module, the data operational components module and the ETL process modeling module. The workflow engine is response for the ETL process resolving, its executing and monitoring. The data operational component module is an open framework, which provides the functional tools box. The process modeling module is used for designing the ETL process. This paper employs the data type mapping to settle the problem of the data inconsistence between the different database systems.The algorithms this paper proposed and the scalability of the ETL tool has some novelty.
Keywords/Search Tags:ETL, workflow, data extraction, data transformation, component technology
PDF Full Text Request
Related items