Font Size: a A A

Research On Classification Algorithm Based On Metabolomics Data

Posted on:2014-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:G Q XuFull Text:PDF
GTID:2268330425491690Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Metabolomics is an important branch of Systems Biology after Genomics, Transcriptomics, Proteomics, it is a regularity discipline which describes biological endogenous metabolites and their impact on the overall state of internal and external environmental changes combining qualitative and quantitative. There are important research values and practical significances about researches on metabolomics in the aspects of medicine, pathogenesis study, human physiology study based on metabonomics and so on.This paper takes Nuclear Magnetic Resonance data (NMR) and Liquid Chromatography Mass Spectrometry data (LCMS) as the objects and studies the classification accuracies and applicability of principal component analysis (PCA), partial least squares (PLS) and isometric mapping algorithm (ISOMAP) separately considering characteristics of the data. This paper also proposes kernel principal component analysis (KPCA) and orthogonal partial least squares (OPLS) to optimize inadequacies of PCA and PLS.This paper applies independent component analysis (ICA) to data preprocessing links for the first time and the results show that ICA algorithm is able to separate independent components of metabolomics data and reduce noise signals effectively. Results of data preprocessing is essential for. Since many classification algorithms are borrowed from the PCA algorithm, so this paper elaborates the principle of PCA algorithm carefully at first, then do the classification of the first set of NMR data with PCA. Sample points of the two-dimensional and three-dimensional score plot were drawn and the results based on gender are ordinary while the results based on drug classes are poor. So considering the limitations of the linear model of PCA algorithm, this paper proposes optimization algorithm which is based on kernel principal component analysis. Results of KPCA show a significant improvement by comparing the classification, But KPCA still doesn’t achieve the desired effect of classification faced with multi-impact factor. For this problem this paper proposes PLS, PLS algorithm successfully solved the problem of multi-impact factor. In order to make the separation between classes maximum, this paper proposes orthogonal partial least squares to optimize PLS, then conducts a comprehensive explanation classification of results and biomarker from the perspective of the loading plot. Finally, in order to compare with the traditional classification algorithms, this paper proposes a new isometric mapping algorithm and focus on the capability of this algorithm for unknown data prediction. Then the accuracy and reliability of the algorithm is proved by Cross Validation.
Keywords/Search Tags:Metabolomics, Data Classification, PCA, PLS, ISOMAP
PDF Full Text Request
Related items