Font Size: a A A

Research And Application Of Text Classification Model Combining Character Features And Topic Features

Posted on:2020-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:W C SiFull Text:PDF
GTID:2428330599459606Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the explosive development of the Internet,huge amount of text data has been generated during the informatization on various fields,but how to make full use of these data is a problem to be solved.A large number of information that lack of categorizing has made it difficult for us to use information,so we need a good text classification model to understand and organize such quantity of information.This is also the reason why text classification is still one of the hot research topics in the field of natural language processing.The existing text classification models generally have a problem of low classification accuracy,and the users of the information cannot accurately locate the required text.Therefore,how to improve the accuracy of text classification is the focus of text classification research.In addition,it is also necessary to take into account the training time of the model,so that the model has the value of application.In order to solve these problems,this thesis includes the investigation of the latest developments of related research at home and abroad,and the analysis of the advantages and disadvantages of each text classification method.Based on the existing research,we propose Topic Character CNN(TC-CNN)and Topic Character CNN GRU(TC-CNN-GRU).TCCNN and TC-CNN-GRU obtains the character features and topic features of the text,which makes the text feature information richer and improves the accuracy of text classification.Based on TC-CNN,TC-CNN-GRU uses Bi-GRU to enhance the ability of the model to capture contextual connections.What's more,TC-CNN-GRU uses the Attention Mechanism to optimize the text features,which further improves the accuracy of text classification.The experimental results show that compared with the existing text classification models,the classification accuracy of TC-CNN and TC-CNN-GRU on the AG and Sogou datasets is significantly improved.In addition,this thesis compares the effects of different topic models and different feature combinations on the classification accuracy of TC-CNN and TC-CNNGRU.The results show that TC-CNN and TC-CNN-GRU using LSA and weighted splicing character feature and topic feature have the highest classification accuracy.Although the classification accuracy of TC-CNN-GRU is higher,the training time of TC-CNN-GRU is much longer than TC-CNN.From the practical point of view,TC-CNN is applied in the news gathering platform to classify the news,so that the news collected by the platform has a unified classification,which is convenient for users to quickly locate the areas of interest.
Keywords/Search Tags:Text classification, Convolutional neural network, GRU, Attention Mechanism, Topic model, Web crawler, News gathering platform
PDF Full Text Request
Related items