Font Size: a A A

An Application Research Of LDA Model On Text Classification

Posted on:2017-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:S Y CongFull Text:PDF
GTID:2428330548483812Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The high dimension,sparseness and ambiguity of texts is the main factors which lead to can not obtain excellent performance for text classification.The traditional feature selection algorithm that is based on the assumption of term condition independence is able to solve the first two problems,but it does ignore the semantic information.In order to take the latent semantic into consideration,the LDA topic model is used for feature extraction.However,the LDA method does not take the input space into consideration effectively,when making topic label to each word in the original space,it holds the non-action words,which affects the probability distribution of the topic extremely.To overcome this insufficiency,a new LSI_LDA is proposed in this paper.The LSI maps the input space to the low dimensional space and filters the non-action words firstly,which makes LDA perform topic label in a simpler and clearer space,so that it can achieve a more precise topic distribution and improve the modeling capabilities.The idea of this kind of pre-filter is as the following : the traditional feature selection algorithm based on conditional independence and mRMR extracts a subset of the original features and does not change the interpretation of the original semantics.LSI with singular value decomposition technology maps the original space into a lower dimensional space,then generate new feature relationship in terms of the latent semantic.It seeks the features which have the best representation to the documents.
Keywords/Search Tags:text classification, feature extraction, mRMR, LSI, LDA
PDF Full Text Request
Related items