Font Size: a A A

Feature selection and discriminant analysis in data minin

Posted on:2005-06-10Degree:Ph.DType:Dissertation
University:University of FloridaCandidate:Youn, Eun SeogFull Text:PDF
GTID:1458390011953006Subject:Computer Science
Abstract/Summary:
The problem of feature selection is to find a subset of features for optimal classification. A critical part of feature selection is to rank features according to their importance for classification. This dissertation deals with feature ranking. The first method is based on support vectors (SVs). Support vectors refer to those sample vectors that lie around the boundary between two different classes. We show how we can do feature ranking with only support vectors. While SV-based feature ranking can be applied to any discriminant analysis, here we considered two linear discriminants, SVM and FLD. Feature ranking is based on the weight associated with each feature or determined by recursive feature elimination (RFE). Both schemes are shown to be effective in various domains. The second method of feature ranking is based on naive Bayes. The naive Bayes classifier has been extensively used in text categorization. Despite its successful application to this problem, naive Bayes has not been used for feature selection. We have developed a new feature scaling method, called class-dependent-term-weighting (CDTW). A new feature selection method, CDTW-NB-RFE, combines CDTW and RFE. Our experiments showed that CDTW-NB-RFE outperformed any of the five popular feature ranking schemes used on text datasets. This method has also been extended to continuous data domains.
Keywords/Search Tags:Feature, Method
Related items