Font Size: a A A

Research And Implementation Of Community Detection Algorithms In Large Scale Information Networks

Posted on:2019-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:S X LiFull Text:PDF
GTID:2428330542499989Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technology affects and transforms every aspect of human life.A highly informatized society is like a huge network that encompasses all things on the earth.Everything on the earth is a node in the network.Each of them operates in isolation and at the same time they are inextricably linked to each other.Network is the abstract of real society.These networks are highly interconnected and intensively overlapped.Systems such as biology,social networking,academics,and information technology all exist in the form of networks.Community detection is to identify and mine the hidden hierarchical community structure in the network,thus it can help people discover the hidden rules in the network,explain the social phenomena and systems represented by networks,and predict the development trend of real society.In recent years,a large number of scholars have devoted themselves to the study of community detection algorithms and have promoted the vigorous development of community detection algorithms.However,there are still some problems in the existing community detection algorithms.First of all,most of the algorithms are aimed at non-overlapping communities,but it is very common for overlapping communities in real-world networks,such as Facebook and Weibo.Secondly,there is a phenomenon of information waste in traditional algorithms.Finally,the real-world networks are often complex.Although the existing algorithms have achieved good results on simulated datasets,they have not performed well when dealing with real networks.With the popularity of the Internet and mobile smart terminals,the scale of the network has shown an exponential growth trend.Traditional community detection algorithms are suitable for small and medium-sized networks.They do not have good scalability and can no longer cope with the problems caused by the rapidly growing network scale.In this thesis we first study the history of community detection,briefly introduce the related concepts of the community detection domain and the classic community detection algorithms,analyze the advantages and disadvantages of each algorithm,and compares different algorithms on the same datasets according to experiments.We also illustrate the challenges faced by current community detection.In view of the waste of information found in traditional community detection algorithms,in this thesis we propose a community detection method that combines node information and network structure.Traditional algorithms always do not take into account the two kinds of information.This method combines node attribute information and network structure,which overcomes the shortcomings of traditional algorithms.What's more,this method effectively uses node information by weighting node attribute and adjusts the contribution of node information and network structure to community detection by setting adjustment parameters.Through the form of matrix summation,the node attribute information and network structure information are merged into the form of weight matrix,then a new network is constructed according to the weight matrix.In addition,by setting weight threshold,unnecessary calculations are reduced.Finally,experiments prove that the weight matrix obtained by our method is more abundant and detailed and overcomes the data sparse problem of traditional adjacency matrix because it effectively uses multiple information of nodes.For overlapping communities and scalability,in this thesis we propose a community detection algorithm based on edge representation learning,which we called CD-ERL.The CD-ERL algorithm extends the natural language processing technology to community detection field.In the edge representation extraction part,the neural network algorithm is used to automatically learn the edge vector,thus the edges in the network are mapped to a continuous space.Then clustering algorithms are used to divide edges into clusters.In the end,the resulting edge communities are transformed into node communities.It uses soft clustering to achieve overlapping community detection which is different from the traditional hard clustering approach.The neural network has the characteristics of large-scale parallel processing and distributed information storage,which improves the ability and scalability of the CD-ERL algorithm to handle large-scale networks.Finally,this thesis simulates the CD-ERL algorithm on nine synthetic benchmark networks and two real-world networks,and compares it with two classic community detection algorithms,Louvain algorithm and LPA algorithm,to verify the performance of CD-ERL algorithm.Experiments show CD-ERL are better than traditional classical algorithms on NMI and V-measure.
Keywords/Search Tags:Network analysis, Community detection, Representation learning, Clustering
PDF Full Text Request
Related items