Font Size: a A A

Research On Text Classification Algorithm Based On Deep Ensemble Learning Of BERT, GCN And GA

Posted on:2024-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y XueFull Text:PDF
GTID:2568306926484744Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of electronic data,text has become one of the most important information carriers and the most common form of data display.How to enable computers to accurately locate effective information from massive text data and achieve automatic classification has become one of the research hotspots.Text classification technology is the key technology to solve the above problems and has important academic research significance.Large scale pre-trained models and graph neural network models are widely used in text classification technology.However,pre-trained models generally use a single classification channel,which cannot better generalize the experimental results trained by the model to new data samples.In addition,the existing single graph neural network model cannot fully utilize some information between input documents and labels,resulting in insufficient utilization of data.Corresponding improvement methods have been proposed for the two issues mentioned above.The specific work content of this article is as follows:1.Aiming at the problem that the single classification channel of the pre-training model can not well extend the experimental effect of the model training to new data samples,this paper proposes a BERT-Boosting model based on Boosting algorithm.By combining the BERT model with the ensemble learning AdaBoost algorithm,multi-channel extension and weight assignment of the BERT model classifier have been achieved,compensating for the shortcomings of a single classification channel,thereby improving the model’s generalization ability and robustness.The experimental results show that the BERT-Boosting model based on AdaBoost algorithm has higher classification accuracy than the BERT-base model on multiple test sets.2.Aiming at the problem that a single graph neural network model will ignore some information between input documents and labels,this paper proposes a GACN model.This model combines the advantages of both GAT and GCN modules through joint training,effectively learning graph based structural information through the GCN module,and learning the association information between node attributes based on the attention mechanism of GAT module.The experimental results show that the GACN model improves the text classification ability compared to a single graph neural network.3.Based on the stacking ensemble learning algorithm,the BERTGACN-stacking model is proposed based on the BERT-Boosting and GACN models.This model inputs the classification results obtained by training BERT-Boosting,GACN,BERTGCN,BERTGAT and BERTGACN models as base classification models into the meta classifier.The meta classifier uses support vector machines to map the input vector from low dimensional space to high dimensional space,and integrates the learning results of the base classifier.It fully considers the advantages of better performing models and the bias caused by worse performing models,Thus improving the text classification ability,generalization ability,and scene adaptation ability of the model.The experimental results show that the BERTGACN-stacking model outperforms the baseline model in terms of final classification performance,proving the effectiveness and feasibility of the improved model in this paper.4.Based on the BERTGACN-stacking model,we design and implement a news management system.This system is designed for news administrators and is divided into multiple operation modules such as login and registration,news management,and comment management.In addition,the management system utilizes the BERTGACN-stacking model to automatically classify text information such as news and user comments,greatly improving the work efficiency of administrators.
Keywords/Search Tags:Text classification, Pre-training model, Graph convolution neural network, Graphic Attention Neural Network, Ensemble learning
PDF Full Text Request
Related items