Font Size: a A A

Research On Text Classification Algorithm Based On Mixed Convolution

Posted on:2022-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:F ZengFull Text:PDF
GTID:2518306758974489Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
As the most fundamental task in natural language processing,the granularity of text classification determines the quality of downstream tasks such as robotic Q?A and relationship extraction.The practical application scenarios of text classification include web classification in search engines,product classification in e-commerce platforms,video classification in short video platforms,etc.Deep learning-based text classification methods can automatically extract nonlinear text features and achieve high classification accuracy and have become mainstream methods.Among them,Convolutional Neural Networks(CNN),as a classical network with local optimum,is good at classifying short texts,but CNN has the problem of long-distance dependence and cannot effectively handle long texts.The Graph Convolutional Networks(GCN)-based approach utilizes the graph to aggregate the whole corpus to attain the global features of the text,thus solving the problem of poor performance of other convolutional networks in classifying lengthy texts;nonetheless,the performance of GCN in classifying short texts is mediocre because the nodes and edges donated by short texts are too few when constructing the text graph,which is not conducive to the classification of information.This is because the short text contributes too few nodes and edges in the construction of the text graph,which is not conducive to information transfer in the graph,and thus the network cannot obtain effective features from the shorter text.In existing research,there is a lack of a text classification method that can handle both long and short texts.The main research of this paper to resolve the above problems is as follows:(1)GCN models text as a heterogeneous graph and extracts global features by convolving the nodes on the whole graph,but unlike conventional convolutional networks,GCN lacks spatial filters that can extract local details.To address the problem that GCN networks cannot extract local features of text better and lead to poor classification of short texts,this paper investigates how to build a text classification model that can extract both global and local effective features of text,and proposes a hybrid convolutional network text classification model based on hierarchical attention.The model combines one-dimensional convolutional layers and graph convolutional layers in parallel to form a Parallel Hybrid Convolution unit(PHC),which combines local text features containing semantic information obtained from one-dimensional convolution and global text features from graph convolution.PHC's feature extraction capability.Compared with the state-of-art methods based on graph neural networks and word embeddings,the method achieves an accuracy improvement of about 3% on the long text dataset Ohsumed and the short text dataset MR,as well as an accuracy improvement of about 1% on the mixed long and short text datasets R8 and R52,demonstrating that the method can effectively classify both long and short texts.(2)Some high-performance deep models(Bidirectional Encoder Representation from Transformers,BERT)and multi-network hybrid models have been proposed to improve text classification results,which have the problems of complex structure and high computational effort.Therefore,this paper investigates how to build a text classification model with high classification performance and low time-space complexity,and proposes a hybrid convolutional network text classification method based on simplified Boosting.The method uses the idea of hierarchical learning,serially combining 1D convolutional layers and graph convolutional layers into a Serial Hybrid Convolution unit(SHC),where the graph convolution is used as a shallow network to obtain global features of text,and 1D convolution is used as a deep network to obtain local features,and a simplified Boosting algorithm(simplified-Boosting)is proposed to solve the problem.The simplified-Boosting algorithm(simplified-Boosting)is proposed to solve the problems of large computation and long training time caused by the deepening of network layers.The method achieves a 2%improvement over the latest graph neural network-based methods on the long text dataset Ohsumed,about 8% on the short text dataset MR,and about 2% on the mixed short and long text datasets R8 and R52;Compared with the high-performance hybrid network VGCN-BERT,the classification accuracy of this method is improved by about 1%,and the network has an order of magnitude less computing time and number of parameters.
Keywords/Search Tags:Text classification, Attention mechanism, Graph convolutional network, Convolutional neural network, Hybrid convolutional networks, Ensemble learning
PDF Full Text Request
Related items