Font Size: a A A

Study On The Network Community Detection With Node Attributes

Posted on:2019-09-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:F Q TangFull Text:PDF
GTID:1368330596954899Subject:Mathematical probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Recently,with the development of modern information technology,a number of systems can be represented as networks,where the elementary of the system and their interactions are defined as nodes and links[114].The complex systems are orga-nized by modules which have their own special functions.In networks,such modules appear as a set of nodes with high-density internal links,while the link density be-tween modules is low.These subgraphs are called communities or modules.Many real networks such as social networks,research cooperation networks,food chain networks,etc.,have community structures[21].Identifying the community structure of complex systems results in further analysis on their internal organizational struc-ture and function.Therefore,detecting communities is a fundamental problem in network science[53].Real word datasets are usually from multiple sources,for example,data de-scribing network structure information and some others describing the properties of nodes.Most of the existing community detection algorithms only focus on network structure,while the traditional clustering algorithm only focuses on the attributes.The community should compose of nodes with homogeneous link density and simi-lar attributes.Therefore,this dissertation aims at finding groups of the network by combining the structural information and the node attributes.We propose three al-gorithms based on different criteria measuring the strength of community structure and node attributes.(1)SpcSA algorithm.The algorithm combines the node attributes and transforms them into the weights of the corresponding edges between nodes.The weights of the edges between nodes with similar attribute information increases,and the edge weights will be reduced with distinct node attribute information be-tween their end nodes.Thus the structural information of the original network is strengthened.Nodes in the same community should have similar attribute information.However,in reality,nodes have multivariate attribute,and some attributes are noise or redundant.In order to address this problem,our method introduces a mechanism by which attribute weights can adjust themselves via a certain weight.The larger the weight is,the greater the role of this attribute plays.We adopt an iterative method similar to the EM algorithm to perform the SpcSA algorithm,that is,fix the weights of the node attributes to estimate the community labels of the nodes,and then fix the cluster labels to estimate the attribute weights.The weights depend on the distribution of the attributes.If some attribute simultaneously possesses high between clusters distance and low within cluster distance,its weight will be large.Simulation experiments and real data experiments show that SpcSA algorithm is effective in identifying communities and learning the relative importance of each node attribute.(2)NMNA algorithm.Many real-world network data has incomplete structural data caused by isolated nodes,for example,users make privacy settings to pre-vent non-friends from viewing their personal information.Such incompleteness causes additional challenges to community detection algorithms.Simultane-ously,these isolated nodes may have abundant attribute information which is helpful to infer their community affiliation.We propose a community detection algorithm called NMNA combining both structural information and attribute information.The clustering method integrates the cost of clustering node at-tributes with the cost of clustering the structural information via the normalized modularity.We show that solve the problem of optimizing objective function is related to the knowledge of matrix.Considering the diversity of the node at-tribute,we should distinguish the relative importance of the attributes.Thus,the NMNA algorithm constructs a cost function in the form of a convex function embedded in the attribute weights,and estimates the attribute weights by the gradient descent method.(3)DCFI algorithm.Firstly,we construct a joint structure-attribute model which is composed of two layers.In the structural layer,the graph distance is used to describe the similarity between nodes.In the attribute layer,the similarity between nodes is described by some function of the distance between nodes along the attributes.If the attribute is numerical,the similarity between nodes is a function of the Euclidean distance along the attribute.If the node attribute is categorical,the similarity between nodes is a function of the Hamming distance along the attribute.We propose the community detection algorithm(DCFI)based on spectral clustering combining both information.A new representation of the original data is a union of the eigen vector generated by the adjacency matrix and that from the node attribute similarity matrix.Moreover,DCFI uses attribute-weighted adjustment method similar to that in SpcSA to identify the degree of contributions of attributes.The DCFI algorithm can effectively solve the community detection problem of networks with non-homogeneous node degrees.The DCFI algorithm can effectively solve the community detection problem of networks with non-homogeneous node degrees.Further,DCFI can deal with three types of network,including strong assortative,weakly assortative and disassortative.
Keywords/Search Tags:Network, Community structure, Spectral clustering, Modularity, Node attributes, Graph distance
PDF Full Text Request
Related items