Font Size: a A A

Micro Blog Feature Expansion For Text Subject Categorization

Posted on:2014-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:X N LvFull Text:PDF
GTID:2268330392469083Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of the Internet, text processing tend to the direction of mass data processing. Traditional text classification methods are basically dealing with plain text (long text) classification, the short text in the real world also abound, such as micro-blog, web search fragment, forums and chat messages, news feeds, book and movie summary Products and user reviews. As the time goes on, more and more short text applications appear in people’s lives. Micro-blog is the most popular in recent years, it provides an information sharing, information dissemination and information acquisition platform for majority of users. Users can log in the platform through a variety of ways to get individuals interested information or information sharing. For the Chinese micro-blog text categorization problems, the main work of this article includes the following aspects:First, short text of Chinese micro-blog text vector expressed less text features, and result characterized scarcity. This article puts forward a method dependent on characteristics entry explained of the Baidu Encyclopedia to do the feature expansion, by extracting the effective explains of the features in Baidu Encyclopedia card. Construct feature correspondence between words to explain statement.Second, encyclopedia card explain the characteristics of the words may be a feature of the word corresponding to multiple encyclopedia card to explain statement, or encyclopedia card explain statement irrelevant feature actual words in context and to be extended to design the kinds of the method to eliminate the noise generated by this feature expansion method. This feature expansion results noise cancellation method by calculating the effective set of words for each explanatory statement towards the expansion of word the original statement coverage effective set of words, and each explanatory statement be extended the word original statement overall set of words match words in coverage. To select the accurate characteristics expansion results based on the results, and get the mapping between explained statement and final feature expansions word set.Finally, this article uses micro-blog feature expansion method, noise cancellation, and combinative feature selection method and multiple classifiers designed Chinese micro-blog text classification systems. The experimental results show that the proposed method has a better performance, to achieve the target.
Keywords/Search Tags:micro-blog, feature expansion, noise cancellation, text categorization
PDF Full Text Request
Related items