Font Size: a A A

The Optimization And Development Of ETL Workflow Process Modeling

Posted on:2010-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LongFull Text:PDF
GTID:2178360275482476Subject:Software engineering
Abstract/Summary:PDF Full Text Request
ETL is an essential component of constructing and implementing data warehouse, which integrating in accordance with uniform rules and enhancing the value of data.ETL needs to process the magnanimous data and the implementation of workflow need to consume a large amount of time. Between data sources are often due to the existence of inconsistent data and data conversion must be the unification of the name and format; Furthermore, ETL process have many similar activities and many consolidate and biodegradable activities, these activities generate a large number of redundant space. So, ETL workflow modeling and optimization is very important. This paper sets up a model for ETL activities, design ETL process and uses state-space search strategies to optimize the workflow.Due to the inconsistence of data sources, we provide a model definition of describing ETL process to unify the name and format of data on the basis of analyzing the data formats of flat file and excel data source adequately, define the model of describing ETL process formally, define entities, relationships and attributes of ETL activities as well as a series of logical process and set up the appropriate conceptual model and logical model. This paper achieves the workflow design of ETL system aiming at data sources and data warehouse and conducts a detailed analysis on the function and design of various parts in the ETL system.At the same time, the design and implementation of the ETL rules is part of a large number of workload relatively, the role of this part is to shield the complexity of the business logic and provide a unified data interface for analysis and application of the data warehouse. Therefore, this paper not only sets up the workflow model which traditional relation model can not describe, but also develops the ETL workflow visually based on it using the current relative popular technology—IS package. In the paper, we propose the total design of the flat file data processing through the use of control flow components and data flow components in the IS package technology. Otherwise, we implement the component modules of data cleaning and data conversion and accomplish the extraction, transformation and load processing of data from support system to data warehouse.The optimization of ETL process is studied to find the smallest cost of ETL process state that need to work on the formal definition of ETL workflow, the status transform of ETL workflow activities abstracting state graphs, proposing a cost model to measure the cost of the implementation of workflow, and then generating all the equivalent state graphs through a series of state conversion and choosing the state graph with the smallest cost from all of the equivalent ETL state graphs, the optimal ETL process. Therefore, the ETL process optimization problem can model as the issue of state-space search, with each state graph representing a specific design. Moreover, this paper puts forward model-generating algorithm and exhaustive search algorithm (ES), on the basis of which represents heuristic search algorithm (HS) on the establishment of a number of heuristic rules and carries on to improve it, thereby reducing the searching state space greatly. The experimental results show that conversion efficiency of the ETL process has greatly improved.
Keywords/Search Tags:ETL modeling, ETL optimization, Conceptual model, Logical model, Workflow, Data warehouse, State space
PDF Full Text Request
Related items