Font Size: a A A

Bayesian Feature Selection For Text Classification

Posted on:2012-04-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Z FengFull Text:PDF
GTID:1228330368496469Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
The automated classification of texts into predefined categories has witnessed abooming interest, due to the increased availability of documents in digital form andthe ensuing need to organize them. An important problem for text classification is fea-ture selection, whose goals are to improve classification effectiveness, computationaleffciency, or both. Due to categorization unbalancedness and feature sparsity in so-cial text collection, filter methods which judge the class character of each feature bymeasuring different sides of its relationship to the class structure may work poorly.In addition, they either select relevant features alone or add a redundant analysis stepsubsequently, which will lose the interaction features.In this paper, we perform feature selection in the training process, automaticallyselect the best feature subset by learning from a set of preclassified documents. Wepropose a generative model, handle the feature selection problem by introducing a bi-nary exclusion/inclusion latent vector, which can be updated via an effcient Metropolissearch. We describe the inffuence among the features by edges, define feature relevanceschematically. The feature selection problem is then turned into an optimization one.Under the Naive Bayes structure assumption, we give a Bayesian feature selec-tion paradigm, get the Bayesian class feature factor, and the Bayesian feature averagingclassifier in which the uncertainty of the features’class character is considered. Un-der the tree structure assumption, we select the interaction features and remove theredundant ones simultaneously. After determining the support graph by multiple con-ditional independent tests, and using decomposable priors for both tree structures andparameters, this problem becomes tractable. Examples illustrate the effectiveness ofthe approaches.
Keywords/Search Tags:Bayesian model selection, Text classification, Generative model, Graphical model, Naive Bayes, Spanning tree, MCMC
PDF Full Text Request
Related items