In today’s big data explosion,the analysis of complex networks is particularly important.As the community structure in complex network plays an indispensable role in further mining the deep structure and physical meaning of complex network,this paper mainly focuses on the community detection algorithm in complex network.Traditional community detection methods mainly consider the topology information of the network when detecting community,and can only describe the community structure from one point of view.However,missing,meaningless and even wrong links are common in real networks,which makes people doubt the accuracy of network community detection based on network topology.For example,in the real network,the edges between nodes are often missing due to incomplete data acquisition,and there are also wrong edges caused by filling in data errors.Obviously,if we only use the topology information(that is,the edge information between nodes)to detect the community structure in complex networks,we will not get ideal results because of the existence of the above noise.In order to make up for the noise in the topology information,many scholars have made efforts.Recently,a more popular method is to fuse the network topology information with the attribute information of the nodes in the network,and then combine the two together to detect the community.Different from topology information,the attribute information of nodes in the network emphasizes the characteristics of nodes themselves.For example,a node in a social network(that is,a person)is usually annotated by a profile containing information about education background,circle of friends and occupation;a node in a citation network(that is,a paper)usually contains annotation information such as title,abstract and keywords.Different from network topology,node attribute information can capture the characteristics of a single node and provide valuable information orthogonal to network topology information.Because of this,the combination of the two has become a research hotspot.However,how to effectively combine these two valuable information is also a challenging problem.Theoretically,if the nodes in a community are closely connected in topological information,they should also have highly similar characteristics(attribute information).Generally speaking,we think that nodes belonging to the same community should have similar node attributes,which is also a supplement to the network topology information.Therefore,even if two nodes are not directly connected,if they have the same characteristic attribute information,we think that the two nodes may belong to the same community.To solve the above problems,in this paper,we combine topology information and attribute information,and propose three methods for community structure detection,and do systematic experiments on synthetic network and real network to verify the effectiveness and practicability of the proposed method.This paper is divided into seven chapters: the first chapter is the introduction,which briefly describes the research background and significance of this paper,and briefly introduces the main innovation and writing framework of this paper.In the second chapter,after a brief introduction of the basic topological properties of the network,several basic network models and several common community evaluation indexes,starting from the existing literature,a brief review of the history and main algorithms of community detection related to this paper is given.In the third chapter,two community detection methods TANMF and TASNMF based on non negative matrix factorization and network attribute information are proposed,and the multiplication iterative formula and rigorous convergence proof are given.The algorithm proposed in this chapter makes up for the shortcomings of the current community detection method based on non negative matrix factorization,such as too many parameters and cumbersome calculation.The fourth chapter is different from the third chapter.Based on the concept of leader community in complex networks,we don’t think that every node in the network has the same weight.At the beginning of the analysis of the network,we find the leader node in the network by combining the network topology information and node attribute information.On this basis,we propose a leader driven model a LFCD combining attribute information.In Chapter 5,inspired by the above two methods,we get that each community detection method has different performance in different networks,so it is difficult to get ideal results by using only one method a EC.On this basis,an ensemble clustering method is proposed to optimize community detection in attribute networks.In the sixth chapter,based on the daily closing price data of Shenzhen 100 index,the correlation coefficient between the daily volatility of each stock is calculated,and the plane maximum filter graph(PMFG)network of Shenzhen 100 index is constructed after removing the market volatility.Then,the three methods proposed in this paper are used to detect the community in PMFG network to reveal the community information in the stock network and analyze the interaction between stock fluctuations.The seventh chapter summarizes and prospects the shortcomings of this paper from the algorithm itself and application scenarios,and gives the future feasible and scalable research direction.The main innovations of this paper are as follows:First,based on the innovation of non negative matrix factorization algorithm.Non negative matrix is widely used in complex network analysis for community detection due to its pure additive constraints,which leads to the good physical meaning of the calculation results.With the expansion of data,single matrix factorization can not meet the demand.In addition,after the network topology information is widely used,the attribute information of nodes also plays an indispensable role in community detection.However,the existing community detection methods based on network attribute information usually have too many parameters and complex calculation process.In order to make up for the shortcomings mentioned above,two nonparametric nonnegative matrix factorization algorithms tanmf and tasnmf are proposed.In addition,we design iterative rules to ensure the convergence of the objective function.Then a lot of experiments are carried out on synthetic network and real network.The experimental results show that our algorithm is better than the existing NMF algorithms.Second,innovation based on leader driven model.Most of the existing leader community detection models only detect the community based on the network topology information,or take the attribute information as the missing value of the network topology information to complete,which loses the integrity of the information to a certain extent.In order to be more perfect in identifying leaders and followers in the network,this paper proposes a leader driven algorithm,alfcd,which combines network topology information and node attribute information.Firstly,the attribute similarity matrix s is generated by using the attribute information of nodes,and then the weight matrix C is generated by combining the S matrix to generate the dependency tree.After calculating the local leader of each node in the network,a preliminary dependency relationship is generated.Then the dependency tree is merged according to the given number of communities to form the final result.The leader driven method,which combines attribute information for community detection,achieves good results in both synthetic networks and real networks.Third,innovation based on ensemble clustering.There are two steps of clustering ensemble: generation based clustering and fusion based clustering.In this paper,a new method AEC is proposed.Most of the existing ensemble clustering algorithms based on non negative matrix factorization are based on single matrix factorization(that is,only one information matrix is used from the basis clustering),which leads to the loss of accuracy.In this paper,the RA matrix is used as the information matrix of the integration results,and the frequency of the same cluster of samples is used as its corresponding attribute matrix W.the final community detection results are obtained by double matrix decomposition. |