Font Size: a A A

Research On Text Classification Algorithm Based On Graph Neural Network

Posted on:2022-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:G K WangFull Text:PDF
GTID:2518306338466774Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The popularity of mobile Internet accelerates the efficiency of information dissemination,and most of the information in the network exists in the form of text.As one of the basic tasks of natural language processing,text classification is widely used in news classification,search matching,information filtering and sentiment analysis.Text classification still has the following problems:on the one hand,when the number of labeled data is small,the model based on machine learning or neural network can not be fully trained;on the other hand,when the number of texts is small or the text is short,the context of words is sparse,and the semantic information is missing.To solve the above problems,this thesis focuses on the algorithm of graph neural network in semi-supervised text classification task.Graph neural network has good semi-supervised characteristics and global information capture ability.In this thesis,transductive and inductive semi-supervised text classification algorithms are studied based on graph neural network.The transductive semi-supervised method can make full use of unlabeled data,and the inductive semi-supervised method can classify new texts in real time.Finally,a real-time text classifier based on keywords is designed.The main contributions of this thesis are as follows:(1)Self-training semi-supervised text classification algorithm based on graph convolutional network.In order to solve the problem of how to improve the performance of model text classification conveniently under a small amount of labeled data,this thesis researches on transductive semi-supervised text classification algorithm based on graph convolution network.Firstly,the algorithm constructs the whole text corpus into a graph structure containing word and document two kinds of nodes.All unlabeled data is fully utilized.Secondly,the algorithm calculates the ambiguity degree of each word as the confidence degree of the word,and the confidence degree of the word is added to the calculation of the edge weight of the graph,which weakens the edge weight of the ambiguous word and reduces the influence of the ambiguous word.Finally,some words with high confidence as keywords in the text are automatically labeled as pseudo label nodes to join the training set,and the label information will be propagated along with the graph convolutional operation on the graph.Experimental results show that,compared with the existing text classification model,the classification accuracy of our algorithm has been improved on each data set.(2)Sentence level semi-supervised text classification algorithm based on graph attention network.In order to solve the problem of(1)taking up too much memory and unable to classify new text in real time when building the whole text corpus into a large graph,this thesis researches on inductive semi-supervised text classification based on graph attention network.Firstly,sentences and sampled text are constructed into a graph which only contains word nodes.The addition of the sampled text enables the algorithm to enrich the word context information with unlabeled text.Secondly,the algorithm uses N-gram and syntax dependency to create the relationship between words from multiple perspectives,and expands the edge of the graph.Finally,the message propagation mechanism based on multi-attention automatically aggregates the effective information of neighbor nodes.The node information is set to global sharing,which supports batch training and new text real-time prediction.Experimental results show that the model achieves the best results on multiple datasets compared with the existing text classification models.(3)Graph neural network real-time text classifier based on keywords.In order to solve the problem that the field or label of text classification often changes in reality,it is necessary to label huge amounts of data again.This thesis designs a real-time text classifier based on the above two algorithms,and the performance of the classifier is evaluated by experiments.The experimental results show that the classifier can quickly and correctly classify new text with only a small number of words,which verifies that the classification algorithms proposed in this thesis have a good application prospect.
Keywords/Search Tags:Text classification, Semi-supervised learning, Graph neural network, Self-training, Attention mechanism
PDF Full Text Request
Related items