Font Size: a A A

The Research Of Load Balance Techniques And Accumulative Iterative Algorithms In Large Scale Asynchronous Graph Processing Framework Maiter

Posted on:2016-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:C L WangFull Text:PDF
GTID:2428330542457253Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of electronic commerce and social networks,the role of large-scale distributed graph processing becomes more and more important.It is widely used in the applications of link analysis,product recommendations,and so on.Maiter as a fully asynchronous large-scale distributed graph processing framework,adopts DAIC model,which can avoid synchronization overhead,increase the convergence speed,and greatly improve the efficiency of large graph processing.In order to further improve usability and applicability of Maiter,we carry out the work in two aspects.To improve the usability of Maiter,we adopt the centralized dynamic load balancing mechanism to solve the load balancing problem of Maiter.In the implemention of load balance mechanism,we solve many problems,which can be summarized as follows.1)We propose the data management based on the Hash data blocks and message management mode based on Hash assigning to bucket,great convenience to data migration and migration data relocation in the process of load balancing.2)We prove that the computation delay of migration data as a result of data migration in the process of calculation,don't affect the final computation results.3)We propose message transfer mechanism to deal with the messages on the flying in the process of data migration,and prove the correctness of the proposed mechanism in theory.4)On the basis of above work,we implement the centralized dynamic load balancing mechanism based on data block on Maiter.The experimental results show that the load balancing mechanism almost does not increase the work load when skew does not exist in the cluster and can be effective to deal with the load skew and improve the whole computational efficiency of the Maiter.To improve the applicability of Maiter,1)we propose Asyn-SimRank,which adopts the iterate-cumulate method,asynchronously executing the core iterative process in order to avoid the high-cost synchronization barriers in large-scale distributed environments,and effectively reduce the amount of computation and communication.2)We propose the keypoint-prior computation to accelerate global convergence.3)We prove the accuracy and the convergence of the Asyn-SimRank algorithm and efficiency of the keypoint-prior computation.4)We then implement Asyn-SimRank on Maiter,which is a distributed framework supporting asynchronous iteration.Our results show that,comparing with the SimRank and Delta-SimRank implementing on Hadoop and Spark,the large-scale Asyn-SimRank significantly promotes the computational efficiency and accelerates the convergence.Though a series of work this paper,we further improve the usability and universality of Maiter framework,created favorable conditions for the practical application of Maiter framework.
Keywords/Search Tags:load balancing, data migration, asynchronous computation, iterative computation, Asyn-SimRank, similarity, big data, MapReduce, Maiter
PDF Full Text Request
Related items