Font Size: a A A

The Research And Implementation On Chinese Short Text Classification Technology

Posted on:2015-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:D K XiongFull Text:PDF
GTID:2268330428964789Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text classification is an important subject in the field of text mining. It refers to that, in the known classification system, it confirms the process of the unknown text category based on the known text content automatically. Text classification helps users deal with the messy information partly and read a large number of texts by their tendency. Currently, most text classifications deal with the long texts which contain more information. However, with the rapid development of Internet, more and more short texts appear in the life. Lack of the information of short texts, traditional methods of long text classification do not work well for short texts.Firstly, the dissertation summarizes the existing technologies of short text classification both domestically and internationally. Most methods of short text classification need to know the background knowledge that cost a lot of energy while they can’t be applied in all cases. The usual short text includes BBS, product reviews, SMS, micro-blog and so on. Compared with the long text, short text has its distinctive features such as short, abnormal words, new words appear constantly and so on, therefore it is significant to study the short text. Then based on search engine and LDA topic model, we put forward a method for the short text classification.Secondly, we discuss the important technology of traditional text classification, including the reprocessing of text, text vectorization, feature extraction, usual classification method and so on. Simultaneously we point out which to improve for dealing with short text classification.Then we introduce the LDA topic model, on the basis of the topic model, we extend and complement the feature information of short text with the combination of search engine. The result of experiment shows that the method in this dissertation can denote short text and improve the performance of short text classification.With the development of the short text classification technology, it will be more and more important for government making decision, supervising the network information, guiding the direction of public opinion and so on.
Keywords/Search Tags:text mining, short text, text classification, topic model
PDF Full Text Request
Related items