| Text classification is an essential research area in Natural Language Processing(NLP)and has received much attention from scholars in recent years.However,real online news text data has the problems of long text,high information content,and complex structure,which reduces the accuracy of news long text classification.BERT(Bidirectional Encoder Representations from Transformers)pre-trained language models are good at extracting global text features.CNN(Convolutional Neural Network)is good at capturing local salient features such as key phrases.For improving the performance of long textual classification of news,this thesis proposes a Local Feature Convolution Network(LFCN)architecture based on BERT and CNN models.LFCN architecture.The architecture consists of four novel modules,as follows:(1)Data Preprocessing module.In order to deal with the restriction of the BERT pre-trained language model,e.g.,set the maximum value of the input data,and to improve the accuracy of news long text classification,this thesis proposes a LEAD-based DLn(Dynamic LEAD-n)extractive digest algorithm.In addition,this thesis constructs a more time-sensitive news long text classification dataset MCNews.(2)Text-Text Encoder(TTE)module.In this thesis,the short text pair is encapsulated into a single token sequence,and the BERT pre-training language model is used as a weight checkpoint for the TTE module initialization.(3)Local Feature Convolution(LFC)module.In order to capture local salient text features,this thesis proposes a CNN-based local feature convolution module to learn local text features such as key phrases.(4)Classifier module.In this thesis,multiple text feature vectors generated by different operations in different periods are fused.And downsampling as well as softmax functions are used to predict the categories of news texts.Finally,extensive experiment is conducted on three datasets,THUCNews,MCNews and MCNewsPlus.The results all suggest that the new method is feasible and effective,and improves the performance of long-text classification of Chinese news. |