Font Size: a A A

Research On Short Text Classification Based On Multi-Granularity Topics

Posted on:2020-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:T XieFull Text:PDF
GTID:2428330590958382Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the mobile Internet and the rapid spread of new media platforms such as Weibo and WeChat,a large amount of short text data has emerged,and the understanding and processing of short texts has become more and more important.Text categorization is one of the important methods of text data mining,and plays an important role in the fields of information retrieval,sentiment analysis and content recommendation.Due to the sparseness of short texts,the traditional text categorization method is not ideal for short text effects.For the sparsity problem of short texts,a feature expansion method based on multi-granularity topic model is proposed.LDA is used to train a plurality of topic models with different granularities from the background corpus,and an optimal combination is selected to form the topic feature space,and then the probability distribution of short text on the topic is used as an extended feature and combined with the original feature,thus achieving the feature extension of short text.Finally,the extended feature vector is inputed into a classifier such as KNN or SVM to realize the classification process of short text.Using the Tencent news corpus and the text classification corpus published by Fudan University as example,the proposed method is compared with other classical short text feature extension or classification methods to verify the effectiveness of the method.Compared with the traditional feature expansion method based on single-granularity topic model,the proposed feature expansion method based on multi-granularity topic model has 1.81% and 3.15% increase in MicroF1 values on KNN and SVM,respectively,and achieves a better classification performance compared with other classical feature expansion method.The experimental results show that the proposed feature expansion method based on multi-granularity topic model can effectively solve the sparseness problem of short text and improve the performance of text classification.
Keywords/Search Tags:short text classification, sparsity, feature expansion, multi-granularity, topic model
PDF Full Text Request
Related items