Font Size: a A A

Research On Heterogeneous Networks Overlapping Community Discovery Algorithm Based On Network Embedding

Posted on:2022-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:L W HanFull Text:PDF
GTID:2480306515972789Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a research in the field of data mining,community discovery has important research value.In the field of community found,scholars to network abstraction of a reality to be homogeneous and heterogeneous networks of two types,the homogeneous all the nodes and edges in the network is a kind of type,and the types of nodes and edges in heterogeneous network is a variety of types,homogeneous network community found that compared with heterogeneous network community discovery research is relatively mature and complete,but because of the vast majority of real networks are heterogeneous network,so the heterogeneous network community discovery research has more practical significance.Because of the heterogeneity of nodes and edges in heterogeneous networks,it is very challenging to find heterogeneous network communities.Based on the existing research on heterogeneous network community discovery,this paper proposes a heterogeneous network community discovery algorithm based on network embedding.The main research contents are as follows: A heterogeneous network similarity measurement method based on edge type probability fusion element path is proposed;Propose an improved LeaderRank node sorting algorithm SAIN;The Heterogeneous network similarity measurement method and node sorting algorithm are used to improve the traditional SLPA algorithm,and a new Heterogeneous network overlapping community discovery algorithm is proposed.Firstly,based on the type of probability integration path of heterogeneous network similarity measure methods: existing heterogeneous network similarity measure method based on the metapath,mostly based on only a meta-path,but due to a meta-path is difficult to cover the types of nodes and edges of the entire network,and under different meta-paths similarity measure are also different,so the path measurement based on a meta-path could lead to the similarity value is missing,the similarity measure turns out to be inaccurate.In order to solve those problems,some people proposed to fuse multiple meta-paths to measure similarity.In this method,the determination of meta-path weight is extremely important,so this paper proposes a new method based on edge probability type to determine meta-path weight,which makes the final allocation of meta-path weight more practical.Then,the entire heterogeneous network is traversed to get the node sequence,which is used as the input of the embedded model skip-gram,and the output of the embedded model is used as the node similarity value.Finally,the final fusion similarity value is obtained through the fusion of the similarity under multiple meta-paths.Secondly,improved LeaderRank node sorting algorithm SAIN:LeaderRank algorithm is a classic node sorting algorithm,which plays an important role in selecting important nodes in the network.However,the LeaderRank algorithm,because it treats all neighbor nodes equally,divides the importance value of a node equally to all neighbor nodes of the node.But the neighboring nodes of a node have different importance to that node.Therefore,this paper introduces the index of neighbor node influence and assigns different importance values to neighbor nodes according to their different importance.Finally,improve the traditional SLPA algorithm heterogeneous network overlap community discovery algorithm NELPA: the traditional sign propagation algorithm(SLPA algorithm)is widely used in community discovery because of its simple principle and efficient algorithm.However,due to the random selection of central nodes,SLPA algorithm results in instability and randomness of community discovery results,and only takes the number of tags in neighbor nodes as the similarity measurement index in the process of tag propagation,which makes the similarity measurement too rough and the accuracy of community discovery results is not high.At the same time,SLPA algorithm can only model the network as homogeneous network,ignoring the diversity of nodes and edges,so it is not suitable for direct application to heterogeneous network.Therefore,in this paper,by using probability integration based on the type of the meta-path heterogeneous network similarity measure method to measure similarity between nodes,the similarity value as tag similarity metrics to guide in the process of transmission,and SAIN algorithm to select the node for the center,in order to improve the defects of the above proposed SLPA algorithm,put forward a kind of overlapping community discovery algorithm can be applied to heterogeneous network.Two real heterogeneous network datasets,DBLP and Last FM,were selected as experimental datasets in this paper.First of all,for the meta-path weight determination method based on edge probability type proposed in this paper,by comparing it with the traditional metapath weight determination method,it can be seen from the experimental results that the meta-path weight obtained by the method in this paper is closer to the understanding in the real world.Secondly,the SAIN node sorting algorithm is proposed in this paper.By comparing with the classic sorting algorithms Page Rank and LeaderRank,and using the SIR disease transmission model to evaluate its transmission rate,the experimental results show that the SAIN algorithm proposed in this paper has a faster transmission rate and a wider transmission range.Finally,in view of the proposed NELPA SLPA overlapping heterogeneous network community discovery algorithm,through the DBLP data set and Last FM data sets and some found in heterogeneous network application algorithm comparing with better effect,and use the extension module of EQ as a measure of overlapping communities found effect,the experimental results show that the proposed NELPA algorithm community found higher precision,better.
Keywords/Search Tags:Heterogeneous information network, Overlapping community discovery, Network embedding model, Similarity measure, Meta-path
PDF Full Text Request
Related items