Font Size: a A A

Single Label Text Classification Based On Graph Neural Networks

Posted on:2022-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y XinFull Text:PDF
GTID:2518306323478704Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text classification is a fundamental problem in natural language processing.It has many applications such as sentiment analysis,spam email detection,topic modeling,news categorization,natural language inference,etc.The task of text classification is to assign labels to a piece of text from a pre-defined set,which can be a sentence,a paragraph or the whole article.Text classification problems are divided into single-label text classification problems and multi-label text classification problems.This article mainly studies single-label text classification problems,which means each piece of text has one corresponding single label.Text classification methods are divided into two categories:statistical methods and deep learning methods.Most deep learning methods are based on deep neural networks,such as recurrent neural networks and convolutional neural networks.Graph neural network is also a kind of deep neural network,it can deal with struc-tured data,and the text has rich structural characteristics,so the application of graph neural network in text classification has attracted wide attention.Text classification based on graph neural network transforms the problem of text classification into the problem of node classification in graph,and it is usually divided into two stages:1.build a text graph based on corpus;2.send the constructed text graph into graph neural network for node classification.The existing graph neural network processing mecha-nism has some shortcomings,such as ignoring the supervised label information when building text graph based on corpus,and the built text graph is fixed,which leads to poor scalability of the model.In view of the above shortcomings,this paper designs two kinds of graph neural network processing mechanisms:isomorphic and heteroge-neous.In the isomorphic graph neural network,label nodes are introduced to learn the feature representation of different kinds of nodes in the same semantic space,and then text classification is carried out;In heterogeneous graph neural network,a more fine-grained graph neural network processing mechanism is used for classification.The first model proposed in this paper is label-incorporated isomorphic graph neu-ral network model(Label-Incorporated GNN).Specifically,the isomorphic graph neural network model constructs a text graph on the whole corpus,including text nodes,word nodes,label nodes.Different nodes are designed with different connection methods.After constructing the isomorphic graph,it is sent to the two-layer graph convolution network to learn the representation of nodes for classification.Label-Incorporated GNN takes label as a part of text graph.It is the first work to introduce label information di-rectly into text classification task using graph convolution network.It can improve the accuracy of text classification and learn interpretable label embedding at the same time.Another algorithm designed in this paper is based on heterogeneous graph neural network algorithm,Label-incorporated graph neural network model constructs a fixed text graph for the whole corpus,which has poor expansibility,and treats different kinds of nodes and edges indiscriminately in the process of graph convolution,so it is easy to lose some heterogeneous information.Aiming at its shortcomings,this paper pro-poses a heterogeneous graph neural network processor system,and proposes a more flexible heterogeneous graph neural network model(InducGGN)The graph is divided into several different subgraphs,and each subgraph represents the different relationship between nodes,and then processed by different graph convolution networks.At the same time,due to the decomposition of heterogeneous text graph,the heterogeneous graph neural network model reduces the computation cost of the isomorphic graph neu-ral network model's complexity.In order to verify the effectiveness of the model,experiments are carried out on three classical text classification datasets:OHSUMED,R8 and R52.The classification accuracy of the label incorporated graph neural network model is 1-2 percentage points higher than the current optimal algorithm.Experiments show that the introduction of label information into graph convolution network is conducive to learning more accurate text representation and improving the classification accuracy.At the same time,this paper also makes a visualization experiment on the label embeddings,which proves that the learned label embeddings has high interpretability.The heterogeneous graph neural network model achieves competitive effect as the isomorphic method,and the speed of model training and testing has been significantly improved.Besides,the heterogeneous method has high generality.Finally,we summarize the advantages and disadvantages of these two methods,and put forward some future research directions.
Keywords/Search Tags:single label text classification, graph neural network, label embedding, heterogeneous network
PDF Full Text Request
Related items