Font Size: a A A

Research On Text Classification Based On Topic Model And Ensemble Learning

Posted on:2022-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:J Q CaoFull Text:PDF
GTID:2518306782977369Subject:Economic Reform
Abstract/Summary:PDF Full Text Request
Text data is unstructured and difficult for computers to directly identify,so it is necessary to vectorize the text data.Whether the document information can be represented reasonably and effectively is a significant factor affecting the performance of text classification.Therefore,it is important to select an appropriate text representation method.The specific work of this thesis is as follows:In this thesis,LDA?Vec model is established to represent text data by integrating LDA model and Word2Vec model.The experimental results show that the method is better than LDA+Word2Vec model.In the process of model establishment,topics extracted from the LDA model are considered globally,it didn't take into account the categorical information and is lack of clarity.In the thesis,topics are selected from the two aspects of topic concentration and topic information entropy to reduce topics that are not useful for classification.Then,in order to improve the traditional Stacking algorithm which ignores the precision of primary classifier,the thesis proposes to use the precision of primary classifier to assign the precision of the first layer output and add original features in the second layer input.The experimental results show that the Stacking algorithm is better than the traditional Stacking algorithm.
Keywords/Search Tags:LDA, Word2Vec, LDA?Vec, topic selection, Stacking algorithm
PDF Full Text Request
Related items