Font Size: a A A

Research On Large Scale Graph Clustering Optimization Algorithm

Posted on:2020-08-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:T Q HeFull Text:PDF
GTID:1360330620954222Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present,the complex network is getting bigger and bigger and shows full of vitality,represented by social networks like Twitter,with its daily active users exceeds 134 million,Snapchat,exceeds 200 million,and the monthly users of Facebook has exceeded 2 billion.These massive amounts of data constitute a very large-scale social network or graph,and similar networks include transportation networks,smart grid networks,biological networks,and academic collaboration networks.Complex networks provide a natural and effective way to simulate complex real-world systems in various fields.Because real-world networks represent meaningful structural communities in many areas,effectively discovering structural information has important research value and significance.Graph Clustering can be studied from different perspectives,including global network community detection,query local communities for target nodes,or explore the evolution of dynamics network.With the advancement of new information technologies,real network data sets in different fields have increased rapidly,the scale the size and complexity of network data sets are increasing by orders of magnitude.Graph clustering is an effective tool for analyzing,modeling,and predicting the evolution of complex networks in various scientific fields.In a graph or network,clusters are usually groups of vertices with a higher probability of connecting to each other to members of other groups,or the similarity between nodes is higher.Many scholars have tried to solve the problem of graph clustering,such as heuristic algorithm,graph segmentation and label propagation algorithm.While,there is no effective framework to handle large-scale graph clustering problem,such as poor clustering quality,complex clustering theory formula or high time complexity.How to deal with large-scale graph datasets remains difficult challenges: large-scale of data,irregular data access patterns,and computationally intensive.Aiming at the drawbacks of existing work,this thesis studies the technology of large-scale graph clustering mining,a fast graph clustering model has been designed,graph clustering models based on high-order structure were proposed and a parallel distance dynamic model for large-scale complex networks was addressed.The main work is as follows:(1)Considering the lack of effective global distance measurement between nodes of large-scale complex networks,a graph node distance representation method based on random path sampling is proposed,which can quickly and effectively obtain the global distance between nodes.When path sampling rate exceeds a certain percentage,the accuracy will have a higher quality assurance.In order to solve the problem that the density peak clustering algorithm cannot effectively handle the sparse region in a graph,a local K-nearest neighbor density method is proposed,which can detect the local sparse graph clustering instead of simply merging the node into other clusters or as a noise.Considering the fact that the density peak clustering model may misallocation the nodes,a two-level optimization rule is designed: direct neighbor connection rule and S-Core-KNN connection rule,which can identify the mis-allocated nodes and reduce the abnormality.The experimental results show that the RPDPC algorithm has better community structure discovery performance,and can perform better performance in sparse networks too.(2)The main task of complex network graph clustering is to detecting community structures,while the identification of special role nodes(such as anomalous nodes and hub nodes)is also a valuable task for understanding the functional structure of complex networks.How to detect community structures in complex networks and identify role nodes has become a challenging issue.Density-based graph clustering algorithms such as SCAN can discover community structures in complex networks to a certain aspects,also can identify anomalous nodes and hub nodes in complex networks.However,traditional algorithms only consider the problem from the perspective of single node or edge,which will ignore the high-order structural information in complex networks.Thus,a novel high-order density graph clustering model(HSCAN)is proposed,which can recognize community structures,hub nodes and abnormal nodes in complex networks based on high-order structural density.Considering that the traditional density-based clustering algorithm has parameter sensitivity problem and its seed node selection has randomness problems,a similarity measure is proposed.When there are multiple nodes satisfying the seed node candidate,the node with the largest similarity to the community is preferentially selected as the seed node on the community expansion of the next stage.The experimental results show that the proposed high-order density graph clustering algorithm is able to handle the community detection in complex networks,also can identify the hub nodes and abnormal nodes in polynomial time.(3)Considering the high-order structure formed by multiple nodes in complex networks,the traditional graph clustering method based on a single vertex or edge cannot meet the requirements.The clustering model based on high-order structure graph density has better clustering quality,but the time complexity is also higher.Thus,a local high-order graph clustering algorithm(MLEO)is proposed.First,a new clustering quality score(local Motif rate)is introduced to effectively reflect the clustering density of higher-order structures.Then,a local high-order expansion algorithm is proposed,the greedy method is addressed to maximize the clustering quality score.In addition,a new seed processing strategy(Motif-seed)is proposed,which extends the concept of node degree(M-degree),which improves the final community by obtaining an initial community with better clustering quality scores.The results show that the proposed strategy can achieve better performance than other methods,especially when the value of the mixed parameters reaches 0.6 or higher.(4)The extensive experimental results show that the distance dynamic model is not applicable to quickly deal with the large-scale(100 million-level edge)graph clustering problem,hence,a parallel graph clustering algorithm(H-Attractor)for large-scale networks is proposed.Firstly,based on the concept of local leadership,the constraints of strong-weak link between nodes are established,and priori constraints are added before dynamic interaction,aiming to reduce the number of computational edges.Then the convergence coefficient is defined to determine whether the dynamic interaction process is in a slow convergence or a non-convergence state.In the dynamic interaction process,when the percentage of the non-convergence distance is smaller than the convergence coefficient,the final value of the distance is pre-determined.Finally,the parallel distance dynamic model is implemented based on the double clone graph partitioning method.The experimental results show the timeliness and accuracy of H-Attractor on discovering large-scale networks(such as Uk-2007),the execution time on 4 nodes and 32 nodes are around 10 and 2 respectively.The improved parallel distance dynamic model shows the characteristics of fast convergence,accurate detection results and is potential to discover community from a billion-scale network.
Keywords/Search Tags:Complex network, random walk, high-order graph clustering, density peak, conductivity, distance dynamic model
PDF Full Text Request
Related items