Font Size: a A A

Optimization Algorithms Based Protein Mass Spectrometry Data Analysis

Posted on:2010-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiFull Text:PDF
GTID:2120360278959831Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
This paper presents the methods of feature selection for protein mass spectrometry data based on intelligent optimization. Protein mass spectrometry is a revolutionary technique to detect early-stage cancer and identify biomarker. But the high dimensionality and small sample size challenge the pattern recognition methods. To avoid the curse of dimensionality of mass spectra, feature selection must be employed to reduce the dimensionality before classification and analysis. The most two critical prescriptions for feature selection are the search strategy and the feature evaluation measures. In the feature selection for biological signals, univariate feature evaluation measures are presented in literature, while multivariate measures are seldom used. In this study, two effective intelligent optimization methods including simulated annealing and genetic algorithm are presented for feature selection, and five multivariate feature subset evaluation measures, including wrapper-based measures and multivariate filter-based measures, are presented and investigated. Moreover, k-fold cross-validation is used to divide the overall data into training and testing subset. A classifier based on linear discriminant analysis is also employed to validate the feature selection methods.Experiments show that the Mahalanobis distance and the linear combination empirical classification error rate and a-posteriori probability are excellent feature subset evaluation measures. The comparison results with the other methods show that this approach, combining intelligent optimization algorithms with the proposed feature subset evaluation measures, obtains better performance than other methods presented in the literature. Experiments on five popular datasets, obtained from FDA-NCI Clinical Proteomics Program Databank and Virginia Prostate Center, prove that this approach can achieve significant class separable feature subsets which supply a reference to find biomarker and detect early-stage cancer.
Keywords/Search Tags:mass spectrometry, simulated annealing, genetic algorithm, feature selection, pattern classification, early-stage cancer detection
PDF Full Text Request
Related items