Font Size: a A A

Short Text Classification Algorithm Research Based On HLDA And CNN

Posted on:2019-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:B ChenFull Text:PDF
GTID:2428330596450290Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In the flourishing era of Web2.0,users,as the core of the Internet,generate massive amounts of text data every day,which includes a high proportion of short text data.The short text data includes chat records of instant messaging tools,evaluation of online shopping products,public comment on current news,micro-blog and comments with limited words and so on.Classification of such short texts plays a very important role in many fields,such as information retrieval,information extraction,personalized recommendation and pattern recognition.Its application area is very extensive and it has great research value and significance.However,the classification accuracy of the existing short text classification algorithm still needs to be improved.Therefore,a short text classification algorithm based on HLDA and CNN is proposed in this paper,which contributes to the improvement of the accuracy of short text classification.Firstly,this paper introduces LDA topic model,convolutional neural network and classification related basic theory,as the theoretical basis of this paper.Secondly,this paper redefines the short text's heat and combines it with the LDA model,and proposes a LDA topic model based on the heat weighted(Heat weighted LDA,HLDA),to extract the topic information of the short text more accurately.Furthermore,based on the HLDA model,a short text classification algorithm based on HLDA and CNN is designed to solve the problem of short text feature sparsity,and the short texts are classified more accurately.In the end,this paper uses the existing public dataset to conduct the comparative experiment,and the experiment verifies the validity of the proposed model and the accuracy of the algorithm.The specific innovation points include:(1)This article builds the HLDA model.In the process of model construction,based on the LDA model,the short text heat factor is introduced to extend the original LDA model to make up for the shortcomings of the original LDA model which is feature sparsity and poor focus of the topic in the short text topic modeling process.(2)A short text classification algorithm based on HLDA and CNN is proposed,which integrates two aspects of information from word and theme to better represent short text,so as to solve the problem of short text feature sparsity and improve the accuracy of short text classification.
Keywords/Search Tags:Short Text Classification, HLDA, CNN, Text Features
PDF Full Text Request
Related items