Font Size: a A A

Text Classification Based On Improved Labeled-LDA

Posted on:2015-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:X DongFull Text:PDF
GTID:2298330467463358Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet and social media in recent years, people can generate and share information anytime and anywhere. This information can ultimately be transformed into text and settle down. Fast and high-quality processing of text content has become the focus of text mining and natural language processing research scholars.Text classification is an important foundation technology for fast information retrieval. Currently, text classification technology has been widely used in search engines, personalized recommendation system, public opinion monitoring and other applications. It is an important part of realizing the efficient management and accurate positioning of huge amounts of information. However, the current text classification performance is unsatisfactory, that could be improved largely.The main topic of this paper is research on text classification technology based on Topic Models (Labeled LDA). The main contents include:1) Traditional LDA model cannot contain the information of outer labels. Labeled-LDA can model the original data and label information by mapping them together. However, this association will cause the over-fitting problem, resulting in lower classification performance. And the mapping relationship between them is one-to-one. We propose an improved labeled-LDA model which maps class information to the combination of a plurality of topics, while the topics can be divided into shared and private parts to be more in tune with the process of texts generation. Improved labeled-LDA can do multi-label classification job better.2) The original skew data affects effect of classification systems, a novel text classification approach based on LDA model is proposed to solve this problem. Experiments show that the performance of classification system after the step of processing through Labeled-LDA is better than traditional methods, and the performance is relatively stable in different degrees of skew corpus.
Keywords/Search Tags:improved labeled-LDA, topic models, data skew, multi-label classification, feature selection
PDF Full Text Request
Related items