Font Size: a A A

Research On Text Classification Based On Deep Learning And Topic-driven

Posted on:2020-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:W Y GongFull Text:PDF
GTID:2518306218969959Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Text classification is the key technology for text data mining and knowledge acquisition in natural language processing.With the rapid development of the Internet,text data has exploded,the number of topics has increased dramatically,and text classification has become difficult.How to efficiently manage massive amounts of text data based on themes,and to classify disorganized text data into clear topics for orderly management has become an urgent problem to be solved.The topic driver refers to determining the theme for the text data to be classified according to the subject-specific text data.With the deep learning in the field of image processing,speech recognition and computer vision and other aspects of feature capture,this study applies deep learning technology to the news text classification task,based on CNN,LSTM to develop text classification research,mainly to complete the following jobs:1.For the text representation of the traditional word bag method,the feature dimension is sparse,and it is impossible to represent the context information.The Skip-gram model in Word2 Vec is used to map each word in the document to the real value vector of the fixed dimension,effectively avoiding the tradition.The word bag method cannot characterize issues such as contextual information.Experiments show that when the word vector dimension is 300,the classification accuracy is the best.2.For the problem of deep feature extraction in the classification of news texts,this paper improves on the basis of CNN and Bi LSTM,integrates the advantages of CNN and Bi LSTM,and obtains the Bi LSTM-CNN model and applies it to the news text classification task..Experiments show that the Bi LSTM-CNN model has better classification accuracy than a single CNN or Bi LSTM.3.For the low accuracy of Bi LSTM-CNN in news text classification tasks,thispaper uses Bi LSTM-CNN model to extract features,uses XGBoost to classify extracted features,converts weak classification problems into strong classification problems,and experiments.The results were compared with Naive Bayes,SVM,KNN and the classification model of XGBoost is better than Naive Bayes,SVM and KNN.
Keywords/Search Tags:Text Classification, Deep Learning, Convolutional Neural Network, Bi-directional Long Short-Term Memory, XGBoost
PDF Full Text Request
Related items