Font Size: a A A

Experimental Research And System Implementation Of Deep Learning Contrast For News Text Classification

Posted on:2021-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:X Z TangFull Text:PDF
GTID:2428330611468438Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the information age and the rapid development of the big data era,text information is becoming more and more abundant,and redundant information is also increasing.How to obtain valuable information and improve the efficiency of obtaining information is a very important issue.The classification of these texts becomes indispensable.The text classification includes sentiment analysis,label classification,etc.,and news text classification is also a very important part.In recent years,scholars at home and abroad have successively studied natural language processing tasks under the framework of RNN,CNN,and Transformer,and found that RNN has poor parallelism and relies heavily on sequence order,and CNN specifically depends on superimposed convolutional layers for long text classification.Too deep convolutional layers will lead to the problem of insufficient optimization of deep network parameters.The Transformer has many internal components,and uses the self-attention mechanism as the feature extractor of the text content,which is different from CNN and RNN.Based on the above background,this article mainly carried out the following work for the news text data set:Comparing the gaps under the three frameworks,the data set of news texts is evaluated by the recall rate,accuracy rate,and F1 value,and the six popular frameworks are studied.Through the comparative experiment of the data set of news texts,they are obtained between them.The pros and cons of the first,first of all,the comparison of FastText,TextCNN and DPCNN,and the improvement of DPCNN among them,the km-DPCNN model is obtained,and the F1 value is 92.3%,which can solve the problem of further deepening the convolution.This F1 value is better than the original DPCNN is 1.18% higher,because TextRNN has a natural sequence advantage,suitable for capturing long language sequences,and TextRCNN can be improved to change the LSTM network to GRU network,so as to strengthen the advantages of long sequences in a targeted manner,and with positive sequence vectors and inverse The combination of sequence vectors further improves the accuracy.The F1 value of TextGCNN is 91.86%,which is 0.88% and 0.36% higher than TextRNN andTextRCNN,respectively.Compared with the model comparison,the most prominent one is the Transformer-based BERT model.94.47% accuracy rate,this is the 110 M size parameter model Google has trained through a large amount of corpus,which is one of the reasons for its high accuracy rate,and consider their actual points from the perspective of language extraction ability,and then pass the calculation The value of the same ratio between them judges that the model fusion can be performed.Finally,the model fusion under the weighted voting method is explored through experiments,and the accuracy rate of 95.07% is obtained.The results also illustrate the effectiveness of model fusion.At the end of the article,the actual news text classification explains the design and implementation of the text classification system according to the needs,and lays the foundation for future news recommendation work.The data collection module,data processing module,data storage module,and data classification module are designed,and Designed the corresponding graphical interface to achieve.
Keywords/Search Tags:text classification, comparative experiment, model fusion, news classification system
PDF Full Text Request
Related items