| With the rapid development of the Internet,a large amount of text data is generated every day,and how to classify text efficiently and accurately has become a topic of concern to many experts.The core idea of text classification is to extract more effective features from the text,and classify the text into corresponding categories according to the different features.Recently,more and more researches have been introduced into text classification by graph convolutional neural networks.The information fusion of central nodes and neighbor nodes by graph convolutional neural networks can capture more text features,thereby improving the accuracy of text classification.At present,there are still many problems in the text classification method based on graph convolutional neural network.For example,the construction of heterogeneous graph is too simple,and the graph convolutional neural network cannot fully extract text information;the central node has equal weight distribution to neighboring nodes,which cannot be extracted from the graph.The problem of obtaining more effective information for important neighbor nodes;the problem that all words in the text graph have the same word vector representation in different texts,and cannot be individually customized.These issues limit the capabilities of graph convolutional neural networks for text classification to a certain extent.In response to the above problems,this paper will carry out the following research:First,for the construction of heterogeneous graphs,a novel text data association method(WWAWD)is proposed in this paper.Take text and words as nodes in the graph,use the T_word2vec method to build connections between words and text nodes,and use the D-PMI method for the connections between words and word nodes.It makes the nodes in the heterogeneous graph rich in connections and can store more feature information.Aiming at the problem of weight redistribution between the central node and neighbor nodes,this paper proposes the GCN_ATT model,which introduces a graph attention mechanism to assign higher weights to important neighbor nodes of the central node and suppress inefficient neighbor nodes.GCN_ATT can extract more text information and capture more feature information during training.Secondly,the use of static word vector representation for text map words leads to the same feature representation of different types of text words,which will lead to some problems such as insufficient text information mining.In this paper,a Bert_GCN text classification method combining graph convolutional neural network and Bert model is proposed.The Bert model is used to fine-tune the text data to obtain the unique feature representation of the text;the text vector extracted by Bert is input into the graph convolutional neural network to fully extract the features.The text information is captured and fused in the graph neural network to achieve better classification results.Finally,the proposed algorithm is validated on five international text datasets containing categories such as news,medicine,and movie reviews.Using the accuracy rate and F1 value as evaluation indicators,the classification ability of the above model is obtained,and compared with other text classification methods to verify the superiority of the model proposed in this paper. |