Font Size: a A A

Research On Short Text Classification Algorithm Based On Graph Neural Network And Fusion Of External Features

Posted on:2022-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:J YanFull Text:PDF
GTID:2518306332957969Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Under the new trend in the digital age,information in major fields such as media communications has been disseminated and promoted at an extreme speed,causing a wave of social networks.Users can collect and browse information as fast as possible on online platforms,including social network Sina Weibo,Zhihu forum,and Douban movie review.Of which,this type of text content is concise and diverse,but rich in users' potential needs,interested directions,and behavioral intentions and other information.How to deal with short text,and sort out the extraction of valuable knowledge for people to use,has been deeply concerned by researchers.Compared with long text,short text has short content,with too sparse features,but strong dependence on contextual semantics.On this basis,traditional short text classification methods such as Machine Learning models cannot obtain satisfactory results.In recent years,text data has been expanded with external knowledge bases,which has become a research hotspot.Deep Learning models are widely applied in the field of Natural Language Processing(NLP),such as Convolutional Neural Networks(CNN),Recurrent Neural Networks(RNN),Long Short-Term Memory(LSTM),which can give priority to the order and position of the text,and achieve a very good effect in capturing the semantic and grammatical information in the local continuous word sequence,ignoring the global dependence of noncontinuous words and long-distance semantic features in corpus sentences.At present,a graph-based approach such as Graph Embedding or Graph Neural Network(GCN)model has attracted the attention of researchers.GCN can directly process the tasks of rich and complex structured relations,and effectively complete the retention of the semantic feature information of global words in the graph,with a wide range of applications in the field of NLP,as well as producing a new method of Text Graph classification.How to effectively utilize global features and enrich contextual semantic information in short text classification has become a key research issue.This paper is to improve the algorithm on the basis of the research of Graph Neural Network.The main research and contributions are as follows:(1)Construction of Text Graph.We used the words in the corpus as nodes and add edges according to the co-occurrence relationship between words to complete the construction of the Text Graph.Due to the sparse features of short text corpus,we also introduced an external knowledge base to enrich the information of the nodes.In this way,the text data can be converted into a structure diagram for Text Graph classification.(2)Extension of the features of short text corpus.This paper proposed a method for extracting feature synonyms with introduced external knowledge base(Word Net)based on GCN,and proved that using knowledge base to expand feature information can effectively improve the problem of short text content concise feature sparse through experiments.(3)Improvement of GCN algorithm based on BERT word vector.This paper proposed a model structure of BERT + External Knowledge Base + GCN,with the algorithm idea to obtain a feature vector containing contextual semantic information through a pre-trained BERT model through a short text incorporating external knowledge which can be embed in the constructed Text Graph.The GCN for processing arbitrary graphs realizes the conversion of text classification to document graphs classification.Then,it is verified by experiments that the fusion of external features and the use of word vectors to expand the node features of text graphs can effectively improve the classification effect of graphs.Finally,compared with the baseline models such as Bert model and Transformer model,the proposed model can make the short text classification more accurate and the text classification data is visualized.(4)Improvement of GCN algorithm based on Bi-LSTM End-to-End model.This paper proposed a model structure of Bi-LSTM + External Knowledge Base + GCN,with the algorithm idea to use Bi-LSTM to prioritize contextual semantic features between word orders on the basis of expanded text,combined with GCN to capture the dependence of long-distance features.Experimental results show that the proposed model has higher accuracy and lower time complexity than the baseline models such as BI-LSTM and Text GCN in short text classification.Finally,the text classification data of the model is visualized.
Keywords/Search Tags:Graph Convolutional Network, Shot Text Classification, BERT Word Vector, Bi-LSTM
PDF Full Text Request
Related items