Font Size: a A A

Research On ETL Cluster Task Scheduling Algorithm Based On Improved GA-ACO

Posted on:2024-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiuFull Text:PDF
GTID:2568307073962189Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As the task scenarios faced by ETL systems become complex and the task volume and its random fluctuations increase,the existing ETL cluster task scheduling algorithm suffers from the lack of real-time scheduling capability and the high probability of partial convergence of the final solution of the algorithm,which can easily lead to serious load imbalance and prolonged task execution time.In order to study the enhancement of real-time scheduling capability and execution efficiency of ETL task scheduling mechanism,a method is proposed based on the improved GA-ACO ETL cluster task scheduling algorithm,and the main work is as follows:(1)ETL cluster task prediction algorithm design,an ARIMA-based prediction model is established to solve the problem of real-time scheduling of ETL cluster task scheduling algorithm in data fluctuation scenarios and to achieve real-time prediction of ETL cluster tasks.The research includes the acquisition of historical data,algorithm modeling,and data output,and the focus is on abstracting the ETL cluster tasks into historical time series for algorithm modeling to achieve accurate and stable prediction.The experimental results show that ARIMA has higher precision,more stable prediction performance and higher accuracy than the experimental comparison model.(2)ETL cluster task assignment algorithm design,an ETL cluster task assignment algorithm based on improved GA-ACO is designed.The main improvements are made to the following elements.The initial solution optimization based on the greedy algorithm,using its fast convergence,the initial solution is sequentially traversed and quickly assigned,with the aim of obtaining a high-quality initial solution.Compared with the random initial solution,the solution is closer to the optimal solution;PSO-based variation link optimization,using the particle swarm algorithm in which the feedback information between each particle swarm is more fully utilized and the feedback information is used to influence the variation factor of the GA,so that the probability of invalid variation can be reduced to a greater extent and the convergence speed can be increased;ACO-based adaptive pheromone optimization,establishing the ACO adaptive pheromone mechanism,so that the ant colony can search around the optimal path as much as possible.The experimental results show that the task assignment algorithm based on the improved GA-ACO is more load-balanced and more stable than the experimental comparison algorithm,and the total time consumed is the shortest.(3)An ETL cluster task scheduling algorithm based on the fusion of ETL cluster task prediction algorithm and assignment algorithm is implemented.The task volatility is judged by the task volatility discriminator in the initial stage of the task;based on the discriminant result,the task prediction is selected or directly output to the task assignment session;after the task assignment is completed,the data of the actual task execution time is collected and After completing task assignment,the actual task execution time is collected and fed to the history time update program to update the history time series;the last step outputs the solution and finally completes the algorithm fusion.The experimental results show that the improved GAACO-based ETL cluster task scheduling algorithm can adapt to real-time ETL task scenarios with shorter task execution time and higher load balancing than the experimental comparison algorithm.
Keywords/Search Tags:ETL cluster, Task scheduling algorithm, ARIMA, GA-ACO
PDF Full Text Request
Related items