Font Size: a A A

Design And Implementation Of Query Acceleration Algorithm In Petuum Graph Computing System

Posted on:2020-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q BiFull Text:PDF
GTID:2428330602455500Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,the scale of graph data represented by social networks has grown rapidly,and the scale of query queries has also increased.However,there are still many problems to be solved in the field of how to efficiently use hyperscale graph data for query.These requirements all point to the development of query acceleration algorithms with higher performance.Although mature big data platforms such as MapReduce and Spark have been developed,the unique characteristics of graph data,that is,the strong dependence between nodes,requires a large amount of data transmission between nodes;There are a large number of iterative processes in the process.These characteristics determine that the query is not suitable for the current mainstream big data platform,so it is important to propose a high-performance query acceleration algorithm.The biggest difficulty in developing a high-performance query acceleration algorithm is that it cannot maintain its original scalability while maintaining performance.In the known graph computing system,the distributed graph computing system is generally used to ensure the system has good scalability,but there is still no standard solution for the system performance improvement,so for the current graph query algorithm Under the requirement of ensuring its scalability,the calculation of large-scale graph data and the consumption of communication are huge,especially for distributed graph computing systems,sometimes the performance of the system is completely determined by the communication time,so a graph calculation is proposed.The system's query acceleration algorithm is an urgent need in the field to improve the system performance under the premise of ensuring scalability.In view of the above problems,this paper focuses on researching and solving how to improve the query performance of large-scale graph data under the premise of ensuring the scalability of the graph computing system.Furthermore,the query acceleration algorithm based on SSP(Stale Synchronous Parallel)parallel computingmodel using memory computing and memory sharing technology is designed and implemented,which effectively guarantees the real-time and scalability of the graph computing system.In order to improve the performance of graph query,this paper is based on SSP parallel computing framework,through algorithm to execute all queries in parallel,and use memory computing technology to accelerate the calculation process;in order to improve the ease of use of graph computing system,this article uses boost::spirit ::qi lexical analysis tool,the query is designed as a SQL-like statement conforming to the EBNF syntax;in order to improve the dynamic scalability of the graph computing system,the kafka message queue is introduced,and the graph structure in the memory is dynamically added,deleted,and changed,and the flow calculation is basically achieved.In order to reduce the double counting,this article uses the similarity between the queries,using memory sharing technology between the queries to achieve the purpose.Based on the experimental results of the query acceleration algorithm proposed in this paper in terms of real-time,portability,ease of use,scalability,load balancing,memory consumption and computing power,the following conclusions are obtained:1.The query acceleration algorithm proposed in this paper uses SSP parallel computing framework to parallelize all query processes,and uses memory computing to speed up the graph query process,which meets the real-time requirements;data communication uses ZMQ standard,which can be directly Running on a distributed platform that supports the ZMQ standard,avoiding the problems caused by platform migration and having good portability;2.The query acceleration algorithm proposed in this paper uses the lexical parsing tool boost::spirit::qi to parse the query statements that conform to the EBNF syntax.It has good ease of use and reduces the difficulty of users.It also uses kafka.Message queue,timely update and modify the data in the memory,provides conditions for stream computing,and provides support for subsequent system support for cluster machine expansion,with good scalability;3.The query acceleration algorithm proposed in this paper will execute all the queries in parallel,which reduces the idle time of the nodes,avoids the time waste caused by the uneven distribution of running time of each node,and has goodperformance and load balance;4.The query acceleration algorithm proposed in this paper uses memory sharing technology between similar queries,and uses the repeated calculation in the query process to put the results of repeated calculations in the shared memory,avoiding the time consumption of repeated calculations,not only It only reduces the memory consumption and speeds up the query process.In summary,the query acceleration algorithm proposed in this paper achieves a good balance in real-time,portability,ease of use,scalability,load balancing,memory consumption and computing power.At the same time,the research work of this paper It has certain theoretical value and has reference and reference significance for similar work.
Keywords/Search Tags:Petuum Graph Computing System, Query acceleration, SSP, Memory calculation, Memory sharing
PDF Full Text Request
Related items