Font Size: a A A

Etl Workflow Modeling And Optimization

Posted on:2008-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z M DingFull Text:PDF
GTID:2208360218450088Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The implementation of ETL (Extraction-Transformation-Loading) is very important in the development of data warehouse (DW). Currently, the large-scale enterprise data warehouse (EDW) should process vast data within a limited time. And the data may have complex logical relationships. Therefore, the execution of ETL requires a long time and sometimes it could not be finished as scheduled. In that case, the stack of data files would arise and cause a lot of problems. To deal with such a problem, we propose a method to model and optimize ETL workflows.In this paper, we first introduce an improved model of ETL activities. On the basis of model, we delve into the modeling and optimization of ETL workflow. We consider each ETL workflow as a state and fabricate the state space through a set of correct state transitions. The time of ETL execution would be decreased through the optimization.Another problem in the development of ETL is that it's a hard work to estimate the time of executing ETL workflows. As a consequence, it would have a great impact on the establishment and implementation of service level agreement (SLA) between the developers and customers. For this reason, we propose a time cost model in this paper. First, during the process of modeling ETL activity, we cite the statistical regression analysis method to analyze the association between time of execution and the volume of data processed. Then, we use the critical path algorithm to calculate the time spent in the execution of the ETL workflow scenario. It could be the reference data for the control of ETL workflow realization.Finally, in order to validate the method, we implement the proposed algorithms and experiment on the variation of measures like time and volume of processed states. As expected, the optimization method could improve the ETL workflow with a high performance when the number of activities is in a certain scale.
Keywords/Search Tags:ETL workflow modeling, ETL workflow optimization, time cost model, state space, data integration, data warehouse
PDF Full Text Request
Related items