Font Size: a A A

Application Of CTM Model Optimization Feature Selection In Text Categorization

Posted on:2017-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z L YangFull Text:PDF
GTID:2348330488482874Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and vast amounts of information occur, which has been transform from information resource-poor era to information-rich resources of big data era. How quickly and efficiently extract the required information from the mass of information is a major challenge in today's information technology, while text categorization is one of the ways to solve this problem. Whereas the feature selection and text representation is an important factor affecting text categorization.At present, the Topic Model CTM (Correlated Topic Model) has been as an effective method for text representation used in text categorization, this model can perfectly show the correlation between topic, make every effort to ensure the integrity of the information effectively reduce the dimension of text data during the same time.In this way,the classification accuracy and speed has been improved. However, some feature selection of the model and to determine the optimal number of topics is still a major problem.Aiming at feature selection and the optimal model to determine the number of topics for CTM model to do some research in this paper, we completed the following work:(1)Analysising in text classification, the model represents encountered the difficulties and the CTM advantage in the text representation;(2)Using the complexity and log-likelihood method to determine the best number of topics of CTM model;(3)Using based on principal component analysis and mutual information feature selection method in CTM model to reduce redundancy feature;(4)Based on the above theory, establishing CTM model and text classification system to prove the validity of the above methods by the R language, and then supplying help for the further development of text classification applications.Finally, the work was summarized and prospected later to study content.
Keywords/Search Tags:text representation, topic number, feature selection, CTM, PCA, MI
PDF Full Text Request
Related items