Font Size: a A A

Research On Performance Optimization For Distributed Graph Computation

Posted on:2019-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:M H JiaFull Text:PDF
GTID:2428330611993626Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a typical abstraction of the real world,graph plays an important role in machine learning,artificial intelligence,knowledge mapping and other related fields.With the continuous growth of the scale of graph data,single computer can not meet the needs of actual computing,and the related research in the field of distributed graph computing has been developed rapidly.For the distributed graph computing system,how to balance the computing load between the nodes in the system and how to reduce the communication overhead in the system are the major challenges and difficulties in the field of distributed graph computing.So optimizing the framework of distributed graph computing,reducing redundant computing and redundant communication are the key problems to optimize the performance of distributed graph computing system.The main contents of this paper include:Firstly,the existing optimization techniques of Graph 500 benchmarking program are deeply studied and analyzed.When analyzing its technical advantages,it also reveals the inherent defects of optimization technology and proposes corresponding optimization methods.Secondly,the existing RDF query systems are deeply studied and analyzed.According to the different storage forms of the underlying data of different systems,the storage overhead,computing overhead and query efficiency are analyzed.The optimization direction is proposed for the shortcomings of various systems.Thirdly,aiming at the deficiency of a large number of redundant communication in Graph 500 benchmark program,this paper analyzes the shortcomings of various optimization algorithms,and proposes a graph calculation optimization method PruX based on communication pruning.By studying the Graph 500 benchmark program,the causes of communication in the system are analyzed,and a large number of redundant communication can be removed in the system.By reducing the redundant communication inherent in the system,the performance of BFS algorithm can be improved.Specifically,the PruX algorithm records the vertex state in each computing node.By pre-detecting the vertex access state,the communication overhead in the system is reduced and the efficiency of the algorithm is improved.Aiming at the problem of excessive storage overhead caused by PruX algorithm,this paper proposes a PruX algorithm based on 2D partitioning,which is used to solve the bottleneck of excessive storage overhead of PruX in large-scale graph data,increases the scalability of PruX algorithm,and proposes corresponding optimization methods to solve the extra communication overhead in the process of 2D partitioning.Fourthly,through exploring the related content of RDF query,this paper puts forward a RDF query optimization method based on data storage compression.In the underlying data storage,the data storage form is reorganized,and the corresponding data access path is modified to reduce the number of comparisons required for each query,reduce the workload,reduce the computational load,thus improving the performance of the algorithm.Specifically,the data with the same subject is placed in the same data structure,and all data share the same subject,and the data with the same predicate is placed in the same data structure.In this case,all data in the data share the subject and predicate.When the query is executed with missing objects,the corresponding subject and predicate shared area is found,and the object is output.When the subject is missing,the operation is similar.If the predicates are re-stored in the corresponding form,the operations with missing predicates can also be queried in the same way,and the broadcast operation caused by the predicate index can be avoided,which reduces the additional communication overhead and the computational overhead in the query process.This paper explores the performance optimization of distributed graph computing system,and the research results have certain theoretical value and guidance value for the performance improvement of distributed graph computing system.Finally,the performance advantages of PruX algorithm and data storage compression method are confirmed by experiments.
Keywords/Search Tags:Distributed Graph Computation, Breadth-First Search, Communication Prunning, RDF Query, Data Compression
PDF Full Text Request
Related items