Font Size: a A A

Study And Implementation For Mircoblog’s Short Text Classification Based On LDA

Posted on:2012-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:D H FangFull Text:PDF
GTID:2248330395958396Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Web2.0, the number of people using Microblog is increasing at a very fast speed. Also Microblog is more and more important to the Internet. Text classification on Microblog is essential on personalized recommendation, Microblog community and spam filting. However, Microblog is sort of short text, contains limited amount of information, and traditional method of text classification doesn’t work well on short text such microblog.In this thesis, a method to classify microblog is proposed, which uses an approach of LDA Latent Semantic Space Analysis based on particularity of microblog. First of all, LDA is used to model the training data set of Microblog short text with class label and obtain its latent semantic distribution, and then the topic distribution of test data set is inferenced according to the model. After obtaining the topic-document distribution of training and test data, it is dealt with features magnifaction algorithm to present the document features. On the basis, support vector machine is applied to classify the test data. Simultaneously, parameters space searching is utilied to improve the classification results.Some research on microblog user analysis also is done based on the content. Firstly, the result microblog is utilized to model data of last chapter to establish the field dictionary, then PMI is used to calculate the correlation between the word from user’s microblog and from field dictionary, then the correlation is summarized to analyze the topics contained in a user’s microblog and these topics’proportion.The experiments show that method in this thesis can perfectly recognize and present the document features and achives good result in classification on mircoblog data. At the same time, microblog user analysis can present a general overview of user’s interests on topics.and this provides support to personalized recommendation.
Keywords/Search Tags:LDA, Microblog Short Text Classification, Microblog User Analysis, SVM, Topic Orientation
PDF Full Text Request
Related items