Font Size: a A A

Optimization And Implementation Of Data Warehouse ETL

Posted on:2015-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:H M YinFull Text:PDF
GTID:2308330461497204Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The data warehouse provides powerful gist for decision support of enterprise, and ETL (Extraction, Transformation, Loading) is the most important part of building a high-quality data warehouse. It needs to handle huge amounts of data and refresh the data of data warehouse timely, how to process vast amounts of data quickly and obtain high-quality data is a very worthwhile studying problem. ETL tools are applied widely in the commercial sector, and the developing efficiency is relatively high. But the Operating efficiency is not satisfactory, and there isn’t specifically design for ETL work-flows before implementing ETL.In order to improve the speed of executing project’ETL, this paper solves optimization problems of ETL by using the method of searching state space. Regarding the ETL workflow as a state. We can search the best ETL workflow from this space state. Firstly, designs the UML class diagram. Secondly, proposes a new pattern generation algorithm which based on the first node before. This paper researches and analyses the conditions of changing state under the UML structure; Besides forecasts the execution time of state with the method of linear regression analysis. Lastly, implements UML structure and related conversion algorithms with JAVA language. Through many experiments, this paper compares and analyses the performance between the normal pattern generation algorithm and the new pattern generation algorithm which based on the first node before, the performance between exhaustive search algorithm and heuristic search algorithm. Finally, uses the theory of optimization above into project, obtains the best ETL workflow after inputing the ETL workflow into the algorithm. After many experiments, we summed up:the UML structure designed by author and the new pattern generation algorithm which based on the first node before can improve the efficiency of the search and the speed of ETL execution, it can also apply to practical engineering project。...
Keywords/Search Tags:data warehouse, ETL, massive data, ETL state space
PDF Full Text Request
Related items