Font Size: a A A

Research And Implementation Of Subgraph Isomorphism Algorithm Based On Spark

Posted on:2019-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:F Y MaFull Text:PDF
GTID:2428330563953728Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The development of Internet has given birth to the era of big data.The collection and arrangement of massive data has developed into a new research area with great potential for development.As an important data structure,graph is the first kind of solution for large-scale structured data representation,which is often used to build interesting models to solve problems and is widely used in many fields.Therefore,it is of great importance to efficiently and effectively process graph structure data.In the processing of graph structure data,the query to the graph is the most basic and most important operation.One of the methods to construct this operation is the problem of subgraph isomorphism,that is,given a query graph q and a data graph g,and then find a graph that is exactly the same as the query graph q topology in the data graph g.However,subgraph isomorphism is a NP complete problem.The process of processing is extremely timeconsuming,especially when the size of the graph structure is large,and its processing time is increasing exponentially.In order to solve this problem,the following two aspects are discussed in this paper.First,the existing subgraph isomorphism algorithm is improved.Ullmann algorithm is the first algorithm to solve the problem of subgraph isomorphism.Many existing subgraph isomorphism algorithms are improved on the basis of it.The Ullmann algorithm is based on the backtracking method.In the process of traversing the search tree,the consistency matrix is filtered and refined,and the mapping matrix is judged according to the isomorphic condition.In this paper,the improvement of the algorithm is based on the filtering process of Ullmann algorithm.One is to change the original filter step for neighbor filtering,which can also filter the neighbor of the selected vertex;The two is to improve the original simplification process to a partial simplification,because the simplification process is extremely time-consuming and it will be effective to reduce the number of simplification process reasonably.To improve the efficiency of the algorithm,the three is to join the selection order of the query graph vertices.Second,improving the processing platform,increasing the size of the data set and the inherent complexity of the subgraph isomorphism are all new requirements for the processing platform.Single machine linear processing cannot meet the needs of data processing.This paper proposes the operation of the algorithm in the Spark cluster.The Spark platform is based on the RDD elastic distributed data set,enriches the operation of the data set,and has the fast processing ability and good fault tolerance,and is especially good at the iterative operation of the large data set.Through experimental comparison,it is proved that the improved algorithm improves the efficiency of the original Ullmann algorithm well,and can be well extended to the Spark cluster,and also provides a new solution for the large-scale graphics data problem.
Keywords/Search Tags:subgraph isomorphism, Ullmann algorithm, Spark, neighbor filtering, partial refinement, vertex sorting
PDF Full Text Request
Related items