| In the era of big data,the scale of data is getting larger and larger,and it has the characteristics of diversity and low information density.In this context,there is a huge amount of news in the form of text on the Internet,and it is difficult to manage.At this stage,the method of processing text data is represented by pre-trained language models such as BERT,which have deep and complex internal structures.Models such as BERT are pre-trained on a large-scale corpus in the training phase,and then fine-tuned according to different downstream tasks.Compared with traditional methods,such methods have stronger performance and better migration.This article focuses on the BERT pre-training model and realizes news text classification through fine-tuning.Combined with the characteristics of the news text,the internal principles,training methods and input characteristics of BERT are studied,and the adjustment strategy for the input processing process is proposed.In the experimental phase,fine-tuning training was completed on the extracted THUCNews subset,and a comparison experiment with the baseline model was performed to determine the best input processing strategy.Using the F1 score that can reflect the overall performance of accuracy and recall rate as an experimental indicator,the F1 score on the test set reached 0.956.This result shows that BERT has a better classification effect than other baseline models.It is also found through analysis that this model exists Defects such as training blind spots,catastrophic forgetting and over-fitting.Aiming at the existing problems of the BERT model,the N-BERT model is proposed based on its model structure characteristics and related theories.The model introduces mecha-nisms such as adversarial training,dynamic learning rate,and layered adaptive adjustment.Adversarial training solves the problem of training blind spots by constructing adversarial samples while improving robustness,dynamic learning rate reduces overfitting by adjusting the learning rates of different stages,hierarchical adaptive adjustment divides different layers by applying different learning strategies,thereby Alleviate catastrophic forgetting.On this basis,ensemble learning is also used to construct a fusion prediction network composed of N-BERT,Bi-LSTM-Attention and TextCNN to further improve the generalization ability of the model.In the experimental stage,an improved comparison experiment was carried out.Several improved methods introduced can increase the F1 score to varying degrees.The F1 score of the fusion prediction network using integrated learning reached 0.973 on the test set.The results show that the N-BERT model has stronger classification performance and stability than the BERT model,at the same time,ensemble learning can significantly strengthen the generalization level of the model. |