Font Size: a A A

Research On Community Discovery Algorithms In Citation Networks Based On Representation Learning And Min-Max Community Extraction

Posted on:2022-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:N C FuFull Text:PDF
GTID:2510306746968759Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
On the one hand,community discovery on the citation network can grasp the structural characteristics of the network from a macro perspective and mine effective topological information;on the other hand,community discovery on the citation network can divide similar papers into the same community.It will help researchers quickly find a batch of papers of interest.This allows researchers to focus more on the research itself instead of spending a lot of time searching for relevant literature.At present,there are still some problems in the field of community discovery.Due to the sparseness of the network,the topological structure information formed by the mutual attention,reference or other relationship between nodes is insufficient,and it is difficult to capture the high-order neighborhood relationship,which leads to the low quality of representation learning and affects the subsequent community based on representation learning.Discovery tasks.Especially in the paper citation network,in addition to the citation relationship formed by mutual citations between papers,each paper also has rich text information,which can effectively integrate the topological citation relationship between papers and the text feature information of the paper itself.A high-quality node representation can be obtained.Secondly,the community discovery algorithm represented by label propagation has problems such as redundancy of label propagation process,randomness of node update sequence,and simple node label update strategy,which cannot be directly and effectively applied to citation networks.Aiming at the above problems,an overall model for community discovery in citation networks based on representation learning and min-max community extraction is proposed.The overall model is divided into two parts.Firstly,a structure-enhanced graph convolutional neural network representation learning model(SEGCN)is proposed.Based on the autoencoder,the structure and text information of the paper citation network are fused,and the structure of the network is enhanced to obtain a dense network to alleviate the sparseness of the citation network.Graph convolutional neural networks capture high-order neighborhood node information and obtain highquality node representation vectors.Then an improved label propagation algorithm(MMCLPA)based on minimum maximal community extraction is proposed,which improves the label propagation algorithm in three aspects: initialization stage,node label update order,and node label update strategy.For the label initialization stage,in order to solve the label propagation redundancy problem caused by the repeated execution of the label propagation process by similar nodes,the initial network is preclustered based on the Node2 select strategy,and the nodes in the minimum-maximum community are assigned uniform labels to avoid subsequent repeated label propagation.process to reduce the execution cost of label propagation.For the node label update sequence part,two-way reachable neighbors,fusion degree and triangular structure are used to describe the node influence,and the node label update sequence is rearranged according to the node influence to avoid "monster community" in the process of label propagation.For the node label update strategy part,the node representation vector is obtained based on the SEGCN model,and the graph attention mechanism is used to describe the label propagation force between nodes.Experiments are conducted on the citation network,and the results show that the structure-enhanced graph convolutional neural network representation learning model can effectively integrate the citation relationship between papers and the text features of the paper itself,so as to obtain higher-quality node representation vectors.The classification task has higher accuracy and can be effectively applied to the subsequent community discovery task based on node representation learning;the improved label propagation algorithm based on the minimum maximum community extraction can divide papers with the same topic into the same community,and in multiple The effect of the evaluation index has improved significantly.In addition to the citation network data set,the algorithm proposed in this paper has certain compatibility,and also has good performance on some non-citation network data sets.
Keywords/Search Tags:network representation learning, community discovery, graph neural network, label propagation, graph attention mechanism
PDF Full Text Request
Related items