Font Size: a A A

Research On Text Classification Based On Deep Learning

Posted on:2020-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y SunFull Text:PDF
GTID:2428330590495308Subject:Instrumentation engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,when faced with massive amounts of text information,people are eager to find an effective way to manage and classify these resources.Among them,text information is used in many fields,such as intelligence analysis and news classification,therefore,it occupies a large proportion of resources.In order to classify texts accurately and obtain correct text labels,this thesis aims to study text classification and to verify Chinese and English news datasets by designing different models to improve the final indicators.First,the general flow of text classification is described,and the advantages and disadvantages of each algorithm are separately analyzed.The feature is extracted by the TFIDF algorithm,and the experiment is performed using the traditional classification methods.The experimental results show that the method can only extract the shallower text features and ignore the connection between each feature word,the accuracy is slightly worse.Therefore,this thesis uses Convolutional Neural Networks(CNN)model for further research.Then,this thesis discusses the specific process of applying the CNN model to text classification and implements it.For the Chinese and English datasets used in this thesis,we tried many experiments to find the best parameters setting in order to achieve the best accuracy.The accuracy of Chinese dataset can reach 96.650%,and the accuracy of English dataset can reach 93.950%.The results show that CNN model can improve the accuracy of text classification very well.However,the softmax layer of the CNN model is weaker than the traditional algorithms in classification and generalization capabilities.Therefore,this thesis proposes combined models,where CNN is used as a feature extractor to acquire remarkable features automatically.Support Vector Machine(SVM)and other classification algorithms serve as our final classifiers to replace the “softmax” layer in CNN to recognize all the classes.The experimental results show that the accuracy of Chinese dataset based on CNN-SVM-KNN model can reach 96.783%,and the accuracy of English dataset can reach 94.425%.The experimental results show that the combined models can have certain improvement effect.Finally,in order to solve the problem that the softmax loss can only optimize the variance between different classes,but can not reduce the difference within the same class,the AM-Softmax loss function which is used in face recognition is introduced.This thesis sets it as the loss function of our model and proposes the AMCNN model.The accuracy of AMCNN model on the Chinese dataset can reach 97.400%,and the accuracy of AMCNN model on the English dataset can reach 95.125%.The experimental results show that the AMCNN model can improve the accuracy of the text classification.At the same time,the Chinese news obtained by the crawler is classified by the trained model,and the result is presented as the form of an interface.
Keywords/Search Tags:Text classification, Convolutional Neural Network, Combined model, AMCNN model
PDF Full Text Request
Related items