Font Size: a A A

Based On Expending Feature Of LDA For Microblog Short Text Classification

Posted on:2016-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:L J LiuFull Text:PDF
GTID:2308330479950945Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Micro-blog as one of the most important representative application in Web 2.0 era got rapid development in recent years. Registered micro-blog account and become a micro-blog user to release and get information on Micro-blogs, this part of the information contains huge business and scientific value. So how to classify micro-blog information to mining short text key value is very important. It will benefit to micro-blog personalized recommendation, hot topic detection, trend detection and spam filtering. However, micro-blog short text classification is facing a enormous challenge, because the micro-blog text usually short, the information is relatively small, so it features more sparse. Traditional text classification methods for long text and cannot work well in micro-blog short text, so researching on classification method for micro-blog short text is very necessary.First, In this paper, to solve the problem of the micro-blog short text classification, To analysis the existing methods and puts forward the feature extension short text classification method based on latent dirichilet allocation model(referred to as LDA).Second, training the LDA modeling using micro-blog original features, short text topic distribution is obtained in LDA model, and then we put the topic words as part of micro-blog short text feature. And then use Vector space model(SVM) to classify micro-blog short text.Last, after experimental verification, this method greatly improves the classification performance. Compared the proposed method with traditional method without features expansion, as well as tongyici cilin expansion. Experiential results show that the accuracy of the proposed method is better than the other. Observing the classification results in different class, the precise and recall rates have significantly improved.
Keywords/Search Tags:micro-blog short text, latent dirichilet allocation model, feature expansion, information gain, feature selection, support vector machine
PDF Full Text Request
Related items