Font Size: a A A

Research On Text Representation And Text Classification Based On Graph Convolutional Network

Posted on:2023-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2558306848454994Subject:Engineering
Abstract/Summary:PDF Full Text Request
Text classification refers to the accurate classification of massive text information.Text classification plays an important role in spam detection,article subject division,information retrieval and other fields.In the application scenarios of most text classification tasks,the content relation of text not only exists in the context,but also appears between discontinuous text paragraphs.Deep learning methods based on traditional convolutional neural networks and recurrent neural networks can effectively deal with the content relation of continuous text within the context,but they ignores the content relation of globally discontinuous text.Graph convolutional network(GCN)can effectively handle tasks involving rich relationships.TextGCN,a text representation and classification model designed based on GCN,can effectively represent the text relationship as a graph and classify the text according to the node relationship in the graph,it becomes a model with better classification effect in graph convolutional network models.However,TextGCN model does not solve the problem of accurate word feature representation according to the topic meaning of word and context relation in the text,at the same time,the problem of using multi-dimensional word feature information to accurately construct text feature representation is not solved effectively.This paper is based on TextGCN model with LDA topic model and Word2 vec model to study the above problems.The specific work of the paper is as follows:(1)A text classification model(LTGCN)based on TextGCN model combined with LDA topic model and Word2 vec model using nodes to represent text is designed.The model is composed of text representation module and text classification module.The text representation module consists of two sub-modules: text relation weight building and word feature representation.The text relation weight building sub-module uses LDA model to design a strategy to construct the weight relations among texts and words from the topic dimension.The word feature representation sub-module is designed by using GCN to encode the features,so as to effectively construct the word feature representation according to the topic meaning of the words in the text.The text classification module combines the generated word feature representation with Word2 vec word feature vector to further strengthen the word feature representation ability,and finally uses softmax classification.Experimental results show that LTGCN model improves the classification accuracy by 0.99% and 0.27% on Ohsumed and Mr Data sets,respectively,compared with TextGCN model.(2)A text classification model(LGGCN)based on LTGCN using graph to represent text is designed.The model is composed of text graph representation module and text classification module.The text graph representation module consists of two sub-modules: text relation weight building and text graph feature building.The text relation weight sub-module uses LDA model to design a strategy to construct the lexical relation in text from the topic dimension and construct the text relation graph describing the lexical relation in text.The text graph feature construction sub-module is designed to use a network structure in parallel between GCN and the full connection layer,this structure is used to encode the feature of the text graph and construct the text feature matrix which can effectively represent the text topic information.The text classification module uses GCN to process the text feature matrix to form the text classification feature vector,and finally uses softmax classification.Experimental results show that compared with LTGCN,LGGCN model improves the classification accuracy of Ohsumed,R8 and R52 datasets by 0.33%,0.73% and 1.41%,respectively.Finally,the classification methods proposed in this paper is compared with other current classification methods based on deep learning.
Keywords/Search Tags:Text Classification, Graph Convolutional Network, LDA Topic Model, Word2vec Word Feature Representation Model
PDF Full Text Request
Related items