| Classification refers to the automatic labeling of data.Text classification belongs to the field of natural language processing.Text classification is a branch of natural language processing and a traditional task in natural language processing.Text classification refers to the correct classification of text into its category given the number of text classification categories.In traditional graph convolutional neural network text classification,text is represented by constructing graph adjacency matrix.Although this text representation method solves the problem of information extraction of unstructured data well,it cannot consider the information contained in the text itself.The sentence in the text itself contains rich syntactic information.In the process of syntactic analysis,the syntactic features in the text can be extracted with finer granularity,so that more features can be obtained through the pre-processing of syntactic analysis before graph convolution.At the same time,in the process of graph convolution text classification,since its composition is based on the distance relation of all document terms to construct a matrix with a dimension of vocabulary size,it is possible that graph adjacency matrix is relatively sparse in this process.Specifically,words in some texts appear less frequently and are less connected with other words,thus the text is represented in the matrix as a long path with fewer adjacent nodes.Long-distance text nodes lead to inaccurate feature extraction of these texts.To solve the above problems,the model framework is improved as follows:(1)TextDependency Parse Graph Convolutional Network(TextDPGCN)is proposed.Firstly,the Dependency syntax information of each text is extracted,and the semantic components of subject and predicate are extracted from the Text information.The text dependency syntactic information matrix of the same size as the graph adjacency matrix is constructed and the subject-predicate object semantic information is integrated into the matrix.Then the text dependency syntactic information matrix is combined with the graph adjacency matrix to form a text representation with stronger expressive ability.To further improve the accuracy of classification model,fine-tuning of TextDPGCN was performed by pre-trained word vector and pooled operation.In this paper,dependency parsing based graph neural network Text classification model(TextDPGCN)is compared with traditional GCN algorithm on R8,R52,Ohsumed and other reference data sets.Experimental results show that our method can improve the accuracy of text classification.(2)TextCNN-TextDPGCN and TextRNN-TextDPGCN are proposed to integrate TextCNN and TextRNN with TextDPGCN respectively.TextCNN-TextDPGCN network uses the multi-scale convolution kernel method of traditional CNN network to extract text local features,which is integrated with the feature extraction of TextDPGCN,to achieve more finegrained extraction of text local correlation and improve the effect of text classification.TextRNN-TextDPGCN network uses the bidirectional LSTM method of traditional RNN network to capture the features of text long sequence information and integrates it with TextDPGCN to extract the features of text long sequence information,to achieve more finegrained extraction of the correlation of text long sequence information and achieve better feature extraction.We experimented the two models on R8,R52,Ohsumed and other benchmark data sets,which indeed improved the accuracy of text classification. |