Font Size: a A A

Feature Expansion Method For Short Text Classification

Posted on:2014-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y H GuoFull Text:PDF
GTID:2298330422490426Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
For the past few years, with the rap id developme nt of internet, a variety ofweb applicatio ns emerge, such as Facebook, QQ, Twitter, Sina Microblog, etc.Along with these web applications, a larage number of text data fo llowed. Manyof those applications produce a lot of short messages. We call the m short textinformation. These data is very large in vo lume and it is very important to dosome research on these short messages. Ana lysis on these data has a number ofapplicatio ns in many areas. For exa mple, it is use ful for socia l networkrecommendation system, network infomation security, web informatio n datamining, topic tracking, new word discovery on web, public opinion monitoringand so on. It can also be used in many other fie lds. What we study focus on thefeature expansion method for short text classification.Short text is characterized ma inly in the short of content, sparse of feature,more noise in content. Traditio nal statistica l text classificatio n algorithms arebased on the bag-of-words paradigm. For the characteristics of short text, thosealgorithms can not be used in this situation. To settle this problem, we designedand imp le mented an algorithm based on the expansio n of web informationretrieved by search engine. We try to get useful information by retrie ving data onweb, and then the next step is to expand short text feature with these webinformation. By us ing this way, the short text messages can be extended to lo ngtext. Fina lly we select some appropriate class ifiers for short text classification. Inthis paper, we chose three common entire-supervised class ifier, and also try touse some semi-supervised classifiers for short text classification.While a prob lem appears when us ing this method, parts of informationretrieved is inappropriate for feature expans ion. To settle this proble m, in thisthesis we propose a constraint a lgorithm for feature extens ion, We got this ideafrom graph algorithm. This a lgorithm need some iterative computing andselection steps. Through this way short text will be continuous ly se lected andeventua lly get the information of high quality to extend the features. We use thealgorithm to e liminate the no ise generated by this feature expansion method.Simulta neously this paper also presents a short text keyword extraction algorithm.This method comb ines some useful information of short text whic h are statistical information, semantic information, and the location and sequence of the keywordappears. In this system we use this algorithm to extract reliab le short text keyword for retrieving network information. Fina lly, we imp le mented our algorithmusing Micro-blog corpus and built a Micro-blo g classification system. We alsodid many multiple contrast experiment on this platform. The fina l experimenta lresults show that the proposed methods are effective for short text classification.
Keywords/Search Tags:micro-blog, short text, feature expansion, text categorization
PDF Full Text Request
Related items