Font Size: a A A

Feature Extraction Based On Complex Network Representation For Metabolomic Data Classification

Posted on:2016-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2180330464456903Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
As an extension of genomics and proteomics, metabolomics is the quantitative analysis of metabolites, and aims to reveal their correlation to the target physiological states. Since metabolomics was first proposed, it has make substantial progress in research and has been applied in various areas including the determination of functional genomics, drug design, and identification of biomarkers.Classification is one of the most widely used analysis methods for metabolomic data. However, metabolomic data are characterized with small sample size, high dimensional, nonlinear, noisy, etc. It is difficult for traditional classification algorithms to obtain desirable results. In this dissertation we propose a new feature extraction method based on complex network topology representation(NTFE) to improve the classification accuracy of metabolomic data. Particularly, NTFE algorithm begins with network construction of the original data, then a new supervised feature selection method based on mutual information and the edge trimming method based on conditions of mutual information are used to reduce the noise of samples. Afterward, a few topological network metrics are extracted as new features to improve the succeeding classification of the samples. NTFE algorithm is tested in the orthotropic liver transplantation chromatography metabolomic data. The experimental results indicate that the algorithm achieves better prediction results than traditional classification methods.A new network topological feature extraction algorithm based on Genetic algorithm(GA-NTFE) is proposed. In GA-NTFE algorithm, the sample characteristics and the parameter used in NTFE algorithm are represented as chromosomes, and the classification accuracy is used as fitness value. Experimental results show that GA-NTFE algorithm maintains high accuracy compared to NTFE algorithm, whereas the number of features and runtime used in NTFE algorithm are decreased by 50%. Moreover, feature weights obtained by GA-NTFE algorithm can be used to describe specific relationship between each metabolite signal and target physiological state.Two feature extraction algorithms based on complex network topology are proposed for metabolic data classification. The two algorithms, namely NTFE and GA-NTFE, are capable of obtaining better classification performance on metabolomic data. The GANTFE maintains high accuracy with smaller feature subset, which clarifies the relationships between key metabolites and target physiological states.
Keywords/Search Tags:Feature Extraction, Feature Selection, Network Topology Metrics, Metabolomics, Genetic Algorithm
PDF Full Text Request
Related items