Research On News Text Classification Based On Deep Learning

Posted on:2022-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:R Shen

Full Text:PDF

GTID:2518306545950679

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of communication technology,content on the Internet has become more and more abundant,and the Internet has become the main way for people to obtain information.With the exponential growth of information,it becomes more and more important to classify and sort information,reduce the time cost of information retrieval,and improve the efficiency of information resource utilization.Text is an important carrier of information transmission.The traditional text classification model requires manual feature engineering,which is inefficient.The feature extraction method used has the shortcomings of high dimensionality and high sparsity.At present,RNN(Rerrent Neural Network)and CNN(Convolutional Neural Networks)models based on deep learning have been widely used in text classification tasks.Bi?LSTM and Bi?GRU effectively alleviate the phenomenon of gradient disappearance and gradient explosion in traditional RNN,but there is a problem of insufficient capture of local semantic features.Although CNN has good parallel computing capabilities and local semantic feature extraction capabilities,the captured information features Limited by the size and number of sliding windows,and can't express complete semantic features.Aiming at the problem that single neural network is difficult to extract the key semantic features in the text,which leads to the unsatisfactory classification effect,this thesis designs a news text classification model integrating the key semantic information,which can extract more semantic features and effectively improve the classification effect of the model.At the same time,aiming at the problem that the feature of news title is sparse,the traditional deep learning model is difficult to extract the effective features,which leads to the poor classification effect of the model,a multi semantic feature fusion model based on BERT is designed,which can effectively extract the features in news headlines and improve the accuracy of the model's classification of news headlines.The main research contents of this paper are as follows:(1)Establish a stop word database based on the characteristics of the THUCNews data set,and constructed three sets of word vectors of different dimensions.And use the basic RNN network to verify the above operation.The experimental results show that the preprocessed THUCNews data set is more conducive to modeling.(2)Designed a news text classification model that integrating key semantic information.The comparative experiment results on the THUCNews dataset show that it can capture key information features in the text,effectively classify news text,and the classification effect is better than the comparison model.(3)Add parts of speech to the input text,and designs a news headline classification model based on multi semantic feature fusion of BERT.The comparative experiment results on the NLPCC2017 data set show that it can effectively extract short text features in news headline classification.Achieved a high classification accuracy rate in the news headline classification task,and the classification effect is better than the comparison model.

Keywords/Search Tags:

Text Classification, Feature Extraction, Key Semantic, BERT

PDF Full Text Request

Related items

1	Semantic Feature Extraction Algorithm, The Contents Of Text Classification
2	Classification Of News Short Text Based On Deep Learning
3	A Subject Classification To News Text Data Based On BERT Pre-training Model And VAE Feature Reconstruction
4	Chinese Text Feature Extraction And Classification Based On The Semantics Association
5	Research On Intent Recognition And Semantic Slot Extraction Algorithms Based On BERT
6	Research On News Texts Classification Based On Keyword Extraction And BERT Word Embedding
7	The Application Of Deep Semantic Feature Extraction Based On Bert In The Analysis Of Hierachical Structure Of Complex Sentences
8	Classification Of Sexual Harassment Dialogue Texts Based On BERT-CNN
9	Research On WEB Page Classification Algorithms Based On Text Semantic Graph
10	Research On Semantic Matching Method Of Chinese Text Based On BERT