Font Size: a A A

Research On Classification Of News Text Based On Deep Learning

Posted on:2020-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:T YuFull Text:PDF
GTID:2518306314980259Subject:Master of Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence,shallow machine learning has not kept up with the development needs of the big data era.For this reason,everyone is constantly exploring new methods.In recent years,deep learning has made major breakthroughs in several fields such as natural language processing(NLP),image processing(CV),and speech recognition(ASR).A fundamental and very classic research direction in the field of natural language processing is text categorization.Regarding text classification,the traditional text classification first pre-processes the text(washing,word segmentation,etc.),then manually extracts the text features,selects the classifier for training learning,and then inputs the text features into the shallow machine learning classifier to complete the classification.Compared with this,deep learning has the advantages of simplifying the classification process,reducing the loss of text information,solving high-dimensional and high sparsity problems,and improving classification accuracy and prediction performance.The research object of this thesis is news.By classifying news texts,it can provide more accurate and effective objects for audience who like to watch all kinds of news.It can also provide quick and complete access to information such as education level,financial development status,and government policies.This thesis uses several important models in the field of deep learning to analyze its application and effect in text categorization.The main research work is as follows:1)High-dimensional sparsity problems are prone to occur when using traditional text representations(one-hot,etc.),while these traditional methods ignore the connection between words.In this thesis,the text information is transformed into a low-dimensional dense numerical vector through the tool word2vec,which not only can obtain the relevance of words,but also avoid the influence of word segmentation error and improve its classification accuracy.2)In feature extraction,traditional machine learning requires manual extraction of features.However,due to the strong subjective initiative,the unstable feature extraction,the classification accuracy is not high.This thesis uses three different principles of network structure to automatically extract features,namely:(1)based on Convolutional Neural network model(CNN),which can capture local correlation features of text;(2)based on Recurrent Neural Network(RNN)Model,Bi-RNN can obtain forward sequence and backward sequence information,and can master the information of text upper and lower sequence well;(3)based on attention mechanism,give different weights of different features,and then perform weighting,it is conducive to highlight key features and improve classification accuracy.3)Each of the above three models has its own advantages,but it also has its shortcomings.Therefore,this thesis proposes a RCNN-Attention hybrid model that combines three models,and combines the advantages of the three models to compare them with a single model in text categorization.The performance of the same news text was classified by four models.The accuracy of the hybrid model was 97.9%,the recall rate was 98%,and the F value was 97.8%.The high accuracy of the hybrid model was verified.Finally,this thesis summarizes and proposes new perspectives.
Keywords/Search Tags:text classification, Convolutional Neural Network, Recurrent Neural Network, Attention mechanism, RCNN-Attention model
PDF Full Text Request
Related items