| The rapid development of the Internet has produced a large amount of data in the form of text,such as web pages,news,papers,emails,user comments and so on.By mining the hidden characteristics of data and automatically classifying these data,it can help people make decisions and improve work efficiency.Text classification task has been one of the research hotspots,widely used in question answering system,recommendation,sentiment analysis and other tasks.At present,using deep learning technology for text classification has become a popular method.Graph neural network is a kind of deep learning model,which integrates deep learning algorithm and graph algorithm,it can capture the rich relationship between nodes in the graph,and save the global structure information of the graph.However,most of the existing models ignore the semantic relationship between the input document and the label,do not make full use of the label information,and the text classification model based on graph convolutional neural network ignores the differences between nodes,and cannot extract the important information associated with nodes.In view of the above problems,this paper proposes HGCNLA model(Heterogeneous Graph Convolution Network with Incorporating Label and Dual Attention),tries to make full use of the correlation between labels and nodes,and combines them to improve the accuracy of graph neural network model in text classification task.The specific work content of this paper is as follows:1.Aiming at the problem of ignoring the semantic relationship between input documents and tags,this paper proposes a heterogeneous graph construction strategy with incorporating labels.The heterogeneous graph includes word nodes,document nodes and label nodes.Edges are created for word nodes and word nodes,word and document nodes,document nodes and document nodes,document nodes and label nodes,label nodes and label nodes.The global structure information contained in the text is obtained through heterogeneous graph,so as to capture more label information contained in the document features and the relationship information between words,documents and labels.2.Aiming at the problem that graph convolutional neural network ignores the difference between nodes,this paper introduces a dual-level attention module,and combined with the heterogeneous graph.Considering documents,words and labels as different types of nodes and considering the importance of different neighbor nodes,a dual-level attention mechanism including type-level attention and node-level attention is constructed.Graph convolutional neural network captures the importance of different types of nodes to a node in the graph through type-level attention,and the importance of different neighbor nodes to a node in the graph through node-level attention.By using a dual-level attention mechanism which includes type-level attention and nodelevel attention,the importance of different types of nodes and the importance of different neighbors of any node can be captured simultaneously.3.In order to make the document representation more discriminative and discriminative,and to classify documents more accurately,a contrast loss is designed.The document representation and label representation after dual-level attention,as well as the document representation and label representation output by the network are compared respectively,so that the larger the inner product between the document representation and its corresponding label representation,the smaller the inner product between the document representation and all label representations.In this way,the document is closer to the corresponding label and separated from other labels.4.Tests were conducted on four publicly available text classification datasets,including news classification datasets R8 and R52,film review datasets MR,medical literature datasets Ohsumed,and compared with the current typical methods TF+IDF+LR,Text CNN,BI-LSTM,Fast Text,LEAM,Graph-CNN-C,Text GCN,HGAT-C.The experimental results show that the proposed method is basically better than the comparison algorithm in the final classification performance.In addition,this paper also set up an ablation experiment to verify the effectiveness of the heterogeneous graph construction strategy of in incorporating label and the dual-level attention module. |