Font Size: a A A

Cohesive Subgraph Search Over Large Graphs

Posted on:2023-07-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:W S LuoFull Text:PDF
GTID:1520307097996709Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of big data,large-scale graph data plays a key role in various fields.Finding constructive structures in different types of graphs to obtain functional information is a problem that has received extensive attention.Cohesive subgraphs,a topological structure composed of a set of closely connected vertices in a graph,have been widely used in social,biological,physical chemistry,finance,and other fields due to their intuitive and high scalability.Currently,various types of cohesive subgraphs have been studied in both academia and industry.However,as the scale and type of data increase,existing cohesive subgraph models cannot meet the needs of new applications,and the computations are inefficient.To meet the needs of different applications and improve the query efficiency of cohesive subgraph search,this paper focuses on the three most common graphs: simple graph,uncertain graph,and bipartite graph,and conducts in-depth research on cohesive subgraph search problems in different types of large-scale graphs.That is,querying influential communities,an important cohesive subgraph model,on simple graphs and uncertain graphs,and finding the maximum quasi-biclique on bipartite graphs.Referring to a large number of existing works,we present novel subgraph models for the shortcomings of existing subgraph models and propose efficient subgraph query algorithms according to the attributes and structural characteristics of different graphs.The contributions of this paper are as follows.In simple graphs,this paper first analyzes the inadequacy of the constraint that all vertices have different weights in the current influence community search problem.Specifically,the constraint limits the search scene of the influential community search,and existing algorithms will output incorrect results in cases where vertices have the same weights.To alleviate this problem,we remove this constraint and study the more general case of influential community search.For vertices with the same weight,the connectivity between vertices is judged to avoid affecting the result.Meanwhile,to further improve the efficiency of the algorithm,we propose a method based on Union-Find to quickly judge the connectivity between vertices.Subsequently,we conduct extensive experiments on the proposed method and existing methods on a large-scale real dataset,and the experimental results show that our method can effectively deal with vertices with the same weight and guarantee the accuracy of the obtained influential communities.In uncertain graphs,this paper proposes a novel model of uncertain influential communities based on the uncertainty of edges to effectively describe communities in uncertain graphs for the first time.To compute the influential communities of uncertain graphs,we first propose an online approach based on vertex peeling,which obtains the influential communities by iteratively deleting the vertex with the smallest weight in the graph and updating the neighbor information of the deleted vertex.To further improve the query efficiency,we propose a new index and index-based query method.Subsequently,we organize the index into a tree-like structure to improve its query efficiency.At the same time,to reduce the space occupation of the index,we also propose two optimization methods: community merging and community compression.The former can merge the communities stored repeatedly between different nodes in the index,while the latter can compress the community vertices stored in the same node.Finally,the experiments on large-scale real data sets compare the differences between the influential community model of uncertain graphs and simple graphs and evaluate the efficiency of the algorithms.Experimental results show that our proposed index-based method runs up to two orders of magnitude faster than the online methods.In bipartite graphs,this paper studies the problem of finding a typical quasi-bipartite model called the maximum k-biplex,which is a maximal k-biplex of a bipartite graph with the largest number of edges.We first prove that the problem is an NP-hard problem and propose the MBS algorithm to solve it.The MBS algorithm enumerates all possible maximal k-biplexes as candidate sets by backtracking and then selects the largest k-biplex among them as the output.To boost the enumerating processing,we propose two pruning strategies based on k-biplex characteristics to remove candidate sets that are unable to be maximal kbiplex.To further improve the search efficiency,we propose a core-based graph reduction method,which can reduce the graph to a series of smaller subgraphs that may contain the maximum k-biplex.At the same time,to avoid computing all subgraphs,we investigate the upper and lower bounds on the size of subgraphs that may contain k-biplexes,thereby avoiding computing subgraphs that are unable to contain the maximum k-biplex.In addition,we also propose a heuristic algorithm and a parallel algorithm to further meet the requirements of efficiency in practical needs.Experimental results on a large number of real data sets show that our approaches are up to 2-3 orders of magnitude faster than the existing state-of-the-art methods.
Keywords/Search Tags:Graph query, Cohesive subgraph, Community search
PDF Full Text Request
Related items