Font Size: a A A

Research And Implementation Of Dynamic Load Balance Based On Data Migration In Large Graph Processing Systems

Posted on:2015-06-21Degree:MasterType:Thesis
Country:ChinaCandidate:J L ShiFull Text:PDF
GTID:2308330482957279Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the past few years, the fiery-hot cloud computing has solved many massive data processing problems, in order to improve the efficiency of large-scale data iterate processing, BSP model is widely used in distributed parallel processing of large graph data, and provides ideas for designing and developing large-scale graph data processing systems. However, because of the machine performance and the graph data skew problem, in each execution super-step, the workload of different tasks may differ. Because individual tasks within the same job runs very slowly, and then delays the execution time of the job, which may have a direct impact on the resource utilization and the parallel computing performance of the entire system. Thus an important assignment for large-scale graph data in a distributed processing cluster environment is dynamic load balancing, which is the main work of this thesis.In order to solve those problems, we studied and implemented a dynamic load balancing mechanism based on the data migration. The overall design goal is to make the balance workload on each task as much as possible, reduce the condition of the "bucket effect" generated by the short-board task, and thus effectively reduce the running time cost of local computing process in the system. The main contributions are as follows:(1) Through analyzing the iterative characteristics of a large graph, graph algorithms can be divided into two categories:the homogeneous convergence graph algorithms and the non-homogeneous convergence graph algorithms, then we analyze the process of the status transition on corresponding algorithms. (2) We make a workload modeling based on the Markov model, design and implement a multi-step workload prediction model based on online learning, utilize the dynamic changes of the graph data scale in the iterative process and the relationship of sending messages between vertices and edges, then we research the prediction of the active vertices number and the prediction of the running time of each task in each super-step and the method of selecting the short-board task. (3) According to the different migrate data granularity, we design and implement a flexible data migration technology. There are three data migration strategies, namely bucket migration, vertices migration and edges migration strategy, including the migrate procedure, the selection strategy of the migrated data, the costs model and the routing table. The three migration strategies have their own advantages and disadvantages; we can select the appropriate migration strategy based on the requirement of user or the density of the graph data.Applying the dynamic load balancing mechanism proposed by this thesis into the BC-BSP system, we take a series of experiments. The experiment proved that the mechanism has achieved the desired results. The average accuracy of prediction models could up to 70%, which 40% higher than Catch The Wind. We compared the migration data scale, time cost, and overall gains for the three different migration strategies. The three kinds of migration policies have outstanding performance, which could prefect balance the workload and reduce running time of the whole system.
Keywords/Search Tags:graph process, parallel and iterative process, dynamic workload balancing, data migration, distributed system
PDF Full Text Request
Related items