Font Size: a A A

Research On Cohesive Subgraph Search On Large Scale Graph

Posted on:2022-08-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z ChenFull Text:PDF
GTID:1488306482987759Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technologies has brought unprecedented op-portunities.Massive graph data generated under high-speed Internet contains a lot of valuable information.Cohesive subgraph search on graph data can help people to obtain useful information from massive data,and it is prevalent in many real-world applica-tions,such as community search and friend recommendation in social network,complex protein detection in protein-protein interaction network,commodity push in shopping network.In this thesis,we propose two valuable cohesive subgraph models: high-order truss model and balanced clique model,and three efficient search algorithms.The main contributions of this thesis are as follows:1.k-truss model is a typical cohesive subgraph model and has been received con-siderable attention recently.However,the k-truss model only considers the direct com-mon neighbors of an edge,which restricts its ability to reveal fine-grained structure information of the graph.Motivated by this,in this paper,we propose a new model named(k,?)-truss that considers the higher-order neighborhood(? hop)information of an edge.Based on the(k,?)-truss model,we study the higher-order truss decompo-sition problem which computes the(k,?)-trusses for all possible k values regarding a given ?.Higher-order truss decomposition can be used in the applications such as com-munity detection and search,hierarchical structure analysis,and graph visualization.To address this problem,we first propose a bottom-up decomposition paradigm in the increasing order of k values to compute the corresponding(k,?)-truss.Based on the bottom-up decomposition paradigm,we further devise three optimization strategies to reduce the unnecessary computation.We evaluate our proposed model and algorithms on real datasets,and the experimental results demonstrate the effectiveness of the(k,?)-truss model and the efficiency of our proposed algorithms.2.Clique is one of the most fundamental models for cohesive subgraph mining in network analysis.Existing clique model mainly focuses on unsigned networks.In real world,however,many applications are modeled as signed networks with positive and negative edges.As the signed networks hold their own properties different from the unsigned networks,the existing clique model is inapplicable for the signed net-works.Motivated by this,we propose the balanced clique model that considers the most fundamental and dominant theory,structural balance theory,for signed networks,and study the maximal balanced clique enumeration problem(MBCE)which computes all the maximal balanced cliques in a given signed network.We show that the maximal balanced clique enumeration problem is NP-Hard.A straightforward solution for the maximal balanced clique enumeration problem is to treat the signed network as two un-signed networks and leverage the off-the-shelf techniques for unsigned networks.How-ever,such a solution is inefficient for large signed networks.To address this problem,in this paper,we first propose a new maximal balanced clique enumeration algorithm by exploiting the unique properties of signed networks.Based on the new proposed al-gorithm,we devise two optimization strategies to further improve the efficiency of the enumeration.We conduct extensive experiments on large real and synthetic datasets.The experimental results demonstrate the efficiency,effectiveness and scalability of our proposed algorithms.3.In the field of cohesive subgraph analysis,the maximum cohesive subgraph search problem has been a hot research topic.In this thesis,we study the maximum balanced clique searching problem based on signed network to find the balanced clique with the largest number of vertices.We prove that this problem is NP-hard.The problem can be used in many applications,such as antagonistic community discovery in social networks,group mining in cooperative networks,business alliance search,antagonistic protein detection.In this thesis,we first explore a baseline algorithm for maximum balanced clique search based on MBCE.Due to the baseline algorithm consumes too large search space,we propose a new search algorithm based on search space partition.Each search region sets different lower bounds of the number of vertices from single side of balanced cliques to refine the search space.Moreover,we also propose a variety of optimization strategies to further reduce the search space.We conduct extensive experiments on eight big real-world signed graphs,the largest of which has hundreds of millions of edges.Experimental results show that our algorithm is efficient,effective and scalable.
Keywords/Search Tags:Cohesivesubgraph, (k,?)-Trussmodel, Higher-orderneighborhood, Search algorithm, Signed network, Balanced clique
PDF Full Text Request
Related items