Font Size: a A A

Research On The Text Classification Method Based On Correlated Topic Model

Posted on:2011-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y X WangFull Text:PDF
GTID:2178360305976545Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of information age, a large number of information stored in text format appears in the Internet, digital libraries and the company's Intranet. Text classification based on artificial intelligence becomes an important technology to deal with the text information. Feature extraction and text representation are keys to the performance of text classification. Now correlated topic model (CTM) has been an effective text representation model. It can not only reduce the dimension without losing important information, but also accelerate the classification speed and improve the classification accuracy. However, there are still some problems in its application as text representation model such as how to select the best topic number and feature.In this paper, text classification method based on Correlated Topic Model was researched deeply, the following work was done:1: Analyzed and summarized the advantages and disadvantages of CTM used in text classification.2: Provided a model topic number selection method of CTM based on density-based clustering to optimize CTM.3: Provided a feature extraction method of CTM combining genetic algorithm and improved mutual information to reduce the redundant features.4: Built a CTM text classification experimental system based on the proposed theoretical method, which not only verified the effectiveness of the provided method, but also provided the basis for the further development of text classification practical systems.Finally, the research work involved in the thesis was summarized and the future developments were forecasted.
Keywords/Search Tags:Text Classification, Clustering, Correlated Topic Model, Genetic Algorithm, Improved Mutual Information
PDF Full Text Request
Related items