Font Size: a A A

Research On News Text Classification Based On Deep Learning

Posted on:2022-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:R ShenFull Text:PDF
GTID:2518306545950679Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of communication technology,content on the Internet has become more and more abundant,and the Internet has become the main way for people to obtain information.With the exponential growth of information,it becomes more and more important to classify and sort information,reduce the time cost of information retrieval,and improve the efficiency of information resource utilization.Text is an important carrier of information transmission.The traditional text classification model requires manual feature engineering,which is inefficient.The feature extraction method used has the shortcomings of high dimensionality and high sparsity.At present,RNN(Rerrent Neural Network)and CNN(Convolutional Neural Networks)models based on deep learning have been widely used in text classification tasks.Bi?LSTM and Bi?GRU effectively alleviate the phenomenon of gradient disappearance and gradient explosion in traditional RNN,but there is a problem of insufficient capture of local semantic features.Although CNN has good parallel computing capabilities and local semantic feature extraction capabilities,the captured information features Limited by the size and number of sliding windows,and can't express complete semantic features.Aiming at the problem that single neural network is difficult to extract the key semantic features in the text,which leads to the unsatisfactory classification effect,this thesis designs a news text classification model integrating the key semantic information,which can extract more semantic features and effectively improve the classification effect of the model.At the same time,aiming at the problem that the feature of news title is sparse,the traditional deep learning model is difficult to extract the effective features,which leads to the poor classification effect of the model,a multi semantic feature fusion model based on BERT is designed,which can effectively extract the features in news headlines and improve the accuracy of the model's classification of news headlines.The main research contents of this paper are as follows:(1)Establish a stop word database based on the characteristics of the THUCNews data set,and constructed three sets of word vectors of different dimensions.And use the basic RNN network to verify the above operation.The experimental results show that the preprocessed THUCNews data set is more conducive to modeling.(2)Designed a news text classification model that integrating key semantic information.The comparative experiment results on the THUCNews dataset show that it can capture key information features in the text,effectively classify news text,and the classification effect is better than the comparison model.(3)Add parts of speech to the input text,and designs a news headline classification model based on multi semantic feature fusion of BERT.The comparative experiment results on the NLPCC2017 data set show that it can effectively extract short text features in news headline classification.Achieved a high classification accuracy rate in the news headline classification task,and the classification effect is better than the comparison model.
Keywords/Search Tags:Text Classification, Feature Extraction, Key Semantic, BERT
PDF Full Text Request
Related items