Font Size: a A A

Research On Text Classification Algorithm Based On Sequential Graph Neural Network

Posted on:2022-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhaoFull Text:PDF
GTID:2518306758492314Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In recent years,under the background of the digital age,with the wide application of network information technology in the fields of social media,e-commerce,information retrieval and recommendation,the number of complex texts on the Internet has grown exponentially,such as book/movie reviews,online news,product introductions,emails,etc.In the era of big data,the existence of unstructured text data provides a rich source of information for data processing and management and how to process,classify and mine valuable knowledge more effectively has been the focus of industry and academia for a long time.At present,text classification methods mainly include traditional machine learning methods and deep learning-based methods.Traditional machine learning methods usually use feature engineering and shallow machine learning classification models for text classification,but artificial feature engineering relies on domain knowledge and the ability to express text features is weak,so it has not achieved satisfactory results in text classification.With the wide application and continuous development of deep learning model algorithms in the field of Natural Language Processing(NLP),in order to extract better features from text data,distributed representation models such as Word2 vec,Glo Ve and deep learning models represented by Convolutional Neural Network(CNN)and Recurrent Neural Network(RNN)have been used to learn text data representation and classification.Compared with traditional text classification models,these models have made significant improvements.These deep learning models prioritize locality and sequentiality and can capture local semantic and syntactic information in continuous word sequences in documents,however,in the corpus with non-consecutive and long-distance semantics,the global word co-occurrence relationship in the text corpus may be ignored.Lately,Graph Neural Network(GNN)has attracted extensive attention of scholars.GNN can effectively deal with tasks with rich relational structure and can preserve the global structural information of graphs.In addition,GNN can model complex semantic relationships in natural language texts and is widely used in NLP tasks such as text classification,semantic role labeling,machine translation and so on.However,the text classification method based on GNN ignores the continuous semantic information in each document of the corpus.Secondly,due to the problem of over-smoothing,GNN has a shallow application layer in text,which cannot capture the long-distance dependencies between the nodes in text graph.How to effectively fuse the contextual features of words in documents and the structural features of words in document graphs in text classification and build a deeper GNN has become the focus of this paper.In this paper,text classification is carried out by improving the GNN algorithm.The main research work are as follows:(1)Sequential feature propagation scheme.This paper proposes a sequence-based feature propagation scheme for text analysis and representation.Specifically,each document in the corpus is trained as an individual document graph,learning the contextual features and graph structure features of each word in the document and obtain text representation.(2)Construction of deep GNN.By decoupling the feature transformation and propagation process in the Graph Convolutional Network(GCN),and using the attention mechanism to automatically extract the information of each layer in the network,the improvement of the Deep-GNN is completed,which is called DGNN in this paper.(3)Sequential GNN model based on Bi-LSTM(Bidirectional Long-Short Term Memory).In this paper,a graph neural network algorithm model integrating Bi-LSTM is proposed.By using Bi-LSTM to capture the contextual semantic features between words in corpus documents and DGNN to capture the long-distance dependency between word nodes in document graphs for text representation,and making experimental comparison with Bi-LSTM,Text GCN and other baseline models on public English text classification datasets,the advantages of this model are shown.(4)Sequential GNN model based on BERT(Bidirectional Encoder Representation from Transformers).This paper proposes a GNN algorithm model integrating BERT.The algorithm idea is to obtain the feature vector containing contextual semantic information of each word in the corpus document by using the pre-trained BERT model,which is used as the initial embedded feature of the document graph,so as to realize the classification of document graph by using DGNN model.The effectiveness of this model is verified by comparing with the baseline models such as BERT and Transformer.
Keywords/Search Tags:Graph Neural Network, Text Classification, Sequential Feature, Bi-LSTM, BERT
PDF Full Text Request
Related items