Font Size: a A A

Research On Microblog Topic Categorization Based On SVM And K-means

Posted on:2016-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:X HuangFull Text:PDF
GTID:2348330479454332Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of social applications such as tweeter generate massive microblogging texts; therefore, when these texts was allocated into topics, due to the features such as short, less and irregular words and enormous quantity etc. of microblog texts, the applicability of traditional classification methods becomes worse. In order to reduce the manual annotation effort during topic categorization process, and maintain the accuracy of the categorization in the meantime; a classification method which basis of the combination between SVM and K-means be issued.The topic categorization of microblog is a type of practical application of text categorization. The common methods of text categorization are supervised learning method, it needs to training the manual annotated text in advance; the process of manual annotation is complicated, in particular to the kind of texts which content is short and causal, also have large numbers, the task of manual annotation will be more complication.In contrast, text clustering is unsupervised learning methods, which do not have the process of manual annotation; however, the efficiency are not good. Focus on these situations, in order to exploit the advantages of both text categorization and text clustering to make up the deficiencies of each other, the microblog topic categorization method which based on the cooperation of SVM and K-means be advocated. Firstly, the method of K-means used to clustering microblog texts, acquire the text which nearest to the cluster centroid of each cluster class as the training set for SVM training. Then processing the microblog text categorization though the trained SVM classifier. According to the combination of these two methods, both made up the inefficiencies of unsupervised learning method and the tedious of supervised learning method.Experiments use a same database of microblog text processing categorization though K-means method,SVM method and the combination method of SVM and K-means in the meantime; the result illustrated that under the premise which the achieved automated annotation to texts, the method which basis of SVM and K-means will not reduce the accuracy of text classification, hence this method enable to acquire the similar accuracy degree with SVM.
Keywords/Search Tags:Microblog Topic Classification, Support Vector Machines, K-means Algorithm
PDF Full Text Request
Related items