Font Size: a A A

Short Text Classification Research Based On Sina Weibo

Posted on:2017-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:L YanFull Text:PDF
GTID:2358330488464363Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet and social platforms, More and more Internet users prefer to publish and access to information in the microblogging platforms, by classifying microblogging content could provide data for the recommended system. However, microblogging text is always short and style is free, traditional text classification techniques is ineffective.This paper introduced the text feature selection algorithm and short text features expansion technology, the purpose is to pick out feature words whose contribution to the short text classification are larger.The main work includes:First, this paper compared some feature selection algorithms which are more efficient in this field and I proposed a method against the minimum word frequency for chi-square test whose classification result is better. This method can remove part of the low-frequency words, at the same time, adding the improved feature selection function to the feature weight calculation process.Second, microblogging short text feature is relatively sparse, so classification result is not satisfactory. This paper compared some text feature expansion algorithms, used the LDA topic model in microblogging short text classification process.Last, through experiments, the text classification method which rose in the study could improve Classification results to a certain extent. Proved that improved chi-square test text feature selection algorithm based on low-frequency words and LDA topic model features extended for short term method of text classification is helpful.
Keywords/Search Tags:microblogging short text, chi-square test, feature selection, LDA topic model, features extended, text classification
PDF Full Text Request
Related items