Font Size: a A A

The Research Of ETL Process Improvement

Posted on:2007-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2178360212965629Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Containing information driven from diverse separate data sources, data warehouse uses ETL refreshing process to maintain consistent status with these data sources. Due to the scale of data set used in complete refreshing, which brings heavy burden to the application and greatly increases ETL refreshing time, data warehouse can not guarantee refresh efficiency and reduce sensitivity towards changes.In this paper, we focus on two different perspectives to improve ETL tool, a) reduce the data set which ETL process deals with, b)improve performance of ETL process per time unit. We implement trigger property extraction and cascade triggering ring discovering algorithm, bring forward the concept of incremental data source, which composes triggers and delta tables to store update information, and add it to the Uniform Data Model. In the meantime, analysis is paid for incremental ETL process to maintain logic correctness. Parallelization contains SPMD (Single Program Multiply Data) and pipeline. Parallelization first distributes data between ETL transformation instances, then implements pipeline in one instance. Our paper discusses several fields concerning parallelization, such as data partition, load balance, parallel optimization etc. These improvements enable ETL process to make full use of computer resources and improve ETL process execution efficiency.
Keywords/Search Tags:ETL process, incremental data source, incremental ETL process, SPMD, pipeline
PDF Full Text Request
Related items