Font Size: a A A

Enhancing pattern recognition using evolutionary computation for feature selection and extraction with application to the biochemistry of protein-water binding

Posted on:2001-12-20Degree:Ph.DType:Dissertation
University:Michigan State UniversityCandidate:Raymer, Michael LeeFull Text:PDF
GTID:1468390014958361Subject:Computer Science
Abstract/Summary:PDF Full Text Request
Statistical pattern recognition techniques classify objects in terms of a representative set of features. The selection and quality of the features representing each object have a considerable bearing on the success of subsequent pattern classification. Feature extraction is the process of deriving new features from the original features in order to reduce the cost of feature measurement, increase classifier efficiency, and allow higher classification accuracy. Many current feature extraction techniques involve linear transformations of the original features to produce new features. While useful for data visualization and increasing classification efficiency, these techniques do not necessarily reduce the number of features that must be measured since each new feature may be a linear combination of some or all of the original features. Here a new approach is presented in which feature selection, feature extraction, and classifier training are performed simultaneously using evolutionary computing (EC). This method is tested in conjunction with a k-nearest-neighbors classifier, and shown to outperform other current methods for feature selection and extraction in terms of minimizing the number of features employed while maximizing classification accuracy. Two new classifiers based on the naive Bayes classifier are developed in conjunction with the EC feature selection and extraction technique, and the resulting hybrid classifiers are shown to yield further improvements in feature subset parsimony and classification accuracy. A key advantage to the methods presented here is the ability to examine the set of linear feature weights produced by EC to perform data mining and exploratory data analysis. The EC feature selection and extraction technique is applied to an important and difficult problem in biochemistry—classification of potential protein-water binding sites. The resulting classifier is able to identify water-binding sites with ∼68% accuracy, and identifies a set of physical and chemical features that correspond well with the results of other studies of protein-water binding.
Keywords/Search Tags:Feature, Protein-water binding, Selection, Pattern recognition, Using evolutionary
PDF Full Text Request
Related items