Font Size: a A A

Reseach On Data Extraction And ETL Task Scheduling Based On BI

Posted on:2019-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:X X YuFull Text:PDF
GTID:2428330578972763Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rise of BI(Business Intelligence)has become an important factor in promoting China's economic development.As a complete set of business intelligence solutions,BI has the business background of multiple business lines and massive data,involving incremental data extraction,ETL task scheduling and so on.Traditional timestamp incremental data extraction is slow in execution.The memory is wasted and there is no way to distinguish between insertion and update operations.At the same time,a variety of scheduling algorithms are used to improve the efficiency of ETL execution for the ETL task scheduling problem,but the total execution time still too long.The simulated annealing algorithm can allocate the task reasonably and improve the efficiency of execution.For the problems of incremental data extraction and ETL task scheduling in ETL processing,two optimization strategies are proposed.First from the micro point of view,to the traditional time stamp of incremental data extraction,the problem of low efficiency and memory waste in traditional incremental data extraction is solved by adding the snapshot table,adding two time stamp fields,adding and updating the two time stamp fields and the corresponding deletion method.Second from the macro point of view,to the ETL task scheduling,using the mathematical modeling to describe problems.The simulated annealing algorithm is used to allocate the tasks reasonably,so that the whole workflow time of ETL shorter,thus improving the efficiency of the whole process of ETL processing.In order to verify the effectiveness of the optimization strategy,a comparative experiment was designed for the above two schemes respectively.The experimental results on real data sets show that the efficiency of the optimized timestamp is improved in incremental extraction of data,and the problem of memory waste is relieved.The insertion and updating operation can be distinguished.Compared the the polling scheduling algorithm and greedy algorithm,using the total execution of ETL workflow is compared with the polling scheduling algorithm,simulated annealing algorithm and greedy algorithm.It is proved that the simulated annealing algorithm is advisable when dealing with the ETL task scheduling problem,and it has a great advantage over the other two algorithms in the comprehensive ability.
Keywords/Search Tags:ETL, Incremental Extraction, Timestamp, Task Scheduling, Simulated annealing algorithm
PDF Full Text Request
Related items