Font Size: a A A

Memetic Algorithm Based Feature Weiehting For High-dimensional Metabolomics Data

Posted on:2015-01-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:J R ZhouFull Text:PDF
GTID:1268330428959338Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Metabolites are small molecule organic compounds produced in metabolism processes that are closely related to the physiological states within living creatures, and contin a wealth of information. Metabolomics is the quantitative study of biochemical relationships between metabolite levels and organisms’multiparametric responses, and aims to eventually accomplish a total understanding of the underlying biological mechanisms. It is expected to provide a more comprehensive picture of living system’s physiology than conventional "-omics " methods, and attracted great attention over the last few years. Metabolomics has been used in a vast range of applications e.g. biomarkers discovery, drug design, toxicology and environmental science.Metabolomics feature data is the data signal obtained and detected from real metabolite compounds. It is the data foundation of metabolomics research. Although many machine learning algorithms have been successfully utilized to extract its underlying biological information, big challenges still exist due to:(1). the high dimensional feature data consists of thousands of different signals, meanwhile the sample number is relatively small; and (2). only a small group of the metabolite signals are related to the target physiological state and the others are noise. Typically, feature selection is introduced in existing methods to overcome these problems. Feature selection can be regarded as a special case of feature weighting where the weight values are restricted to{0,1}. Previous research shows that feature weighting tend to outperform feature selection for data where the importances of feature signals are different. Moreover, feature weighting can uncover the precise relationship between each metabolite signal and the target physiological state by assigning appropriate weight value, which is important in the successive research. Thereby we utilize feature weighting in this work.High dimensional feature weighting is a large scale optimization problem. It can be solved effectively by using computational intelligence methods, particularly the Memetic algorithms (MAs). MAs work as a synergy of global evolution and local searches, and obtain higher performance than conventional methods especially on complex problems. In this work we propose a metaheuristics chain (MetaChain) model that utilizes global and local searches more flexibly and efficiently. Probabilistic models based scheduling mechanisms are introduced within the framework of MetaChain to form two novel MA improvements. Experimental results on large scale optimization benchmark functions show that the proposed algorithms obtain better results than counterpart state-of-the-art methods.Combining the MA optimization based feature weighting algorithm, and prominent machine learning methods, i.e., SVM and ELM in a wrapper fashion, a novel feature weighting system for the metabolomics data classification and regression is proposed in this work. We apply this system on the microdialysis-high performance liquid chromatographic derived orthotopic liver transplantation metabolomics data, and the seal pup blood metabolites feature data. Experimental results have demonstrated better prediction accuracy of the proposed feature weighting system than conventional methods. Moreover, the exported feature weights reveal the precise relationships between metabolite signals and target physiological states. This information can be used in further research.
Keywords/Search Tags:Metabolomics, Bioinformatics, Computational intelligence, Memeticalgorithm, Feature weighting
PDF Full Text Request
Related items