Font Size: a A A

Research Of Extended Personalized Subgraph Search Problem On Large Scale Graphs

Posted on:2019-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y R HuFull Text:PDF
GTID:2370330569996081Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The graph model describes entities and relationships by points and edges.Compared with other data structures,it describes the complex relationship among transactions more easily and effectively.Therefore,the graph model is widely used in many fields to solve practical problems.With the massive growth of data in modern era,solving the problem with the large-scale graph model attracts more and more attention.As one of the important sub-problems in graph theory,the subgraph search problem has far-reaching research significance and value.At present,the algorithms for solving subgraph search problems are mostly only applied to the non-rights graph and not very good on large scale data graph.Although some researches have proposed to solve the subgraph search on large scale graph in parallel in recent years,the existing algorithms are inefficient in some common limited queries and may not even meet the query requirements.In this paper,a kind of limited query in subgraph search problems is innovated and improved.The main research contents are as follows:Do expansion and the formal definition on the existing personalized subgraph search problem.In the existing subgraph search problems,the query graphs all have unique attribute values.The personalized subgraph search problem also requires that the weights of some of the regions be greater than the threshold set by the user.In the extended personalized subgraph search problem,the nodes of the query graph are divided into two types: one type is that the attribute value of the node is unique;and the other type is that the attribute value of the node is optional.All the nodes in data graph have unique attribute values.The extended personalized subgraph searching is looking for subgraphs in the data graph that satisfy: 1)The attribute values of the first type of nodes in the query graph are the same as the attribute values of the corresponding nodes in the subgraph;2)One of the attribute values of the second type node in the query graph is the same as the attribute value of the corresponding node in the subgraph;3)Part of the sub-region weight is greater than the user-set threshold.Aiming at the extended personalized subgraph search problem,this paper proposes DSSA algorithm based on double index.Use pruning,fixed-value domain search,nonfixed-domain search sequence to search the subgraph that satisfy the query conditions.At the same time,build CP index and Vin index to speed up pruning and searching.CP index classifies adjacent nodes of the same type into the same area,and builds index for the node type,the number of internal nodes and internal weights in every area.The Vin index builds an index on the nodes in adjacent regions of the value node in the data graph.Extend DSSA algorithm on the distributed environment based on MapReduce.After analyzing the problems that may arise in the distributed environment,the DLP algorithm is proposed to divide and store the large-scale data graph in the stage of data graph processing.Subsequently,the DSSA algorithm is extended to the PDSSA algorithm to solve the extended personalized subgraph search problem in parallel.
Keywords/Search Tags:data graph, subgraph search, index, graph partition, MapReduce
PDF Full Text Request
Related items