Font Size: a A A

Research On Trend Analysis Of Metabolomics Time Series Data

Posted on:2013-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhangFull Text:PDF
GTID:2248330371997337Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Metabolomics is an important branch of systems biology, and its main purpose is to study the regular pattern of the metabolites when some stimulation or impact happened. The change of biosome metabolites can reflect the physiological status and the physical condition. Magnetic resonance technology and mass spectrometry chromatographic detection technology are two main metabolomics analysis technology at present, and the datasets which get from them are always high dimension. Hence, selecting the informational variables from the large data is very meaningful to get a comprehension of the complex biological process.This paper studies the metabolomics time series problem, and discusses the analyzing method of overall metabolic trend organisms. This article reviews the classification, clustering and feature selection technology of data mining, and briefly introduces the existing time series technology and proposes a clustering based metabolomics-time series analysis strategy aiming at analyzing the metabolomics time series problem.The data of metabolomics time series usually contain hundreds or even thousands of metabolites, and many of them are functional similar. The article does the data transforming in the first stage, and groups the metabolites by means of the ensemble clustering technology. Each group is analyzed in control group against disease group, and the clusterings which have significance are kept and the metabolites variation tendency is studied with the corresponding control group, consequently get the final metabolite group which is relevant to the disease progression. Two real time series data sets were adopted to validate our method, one is the rat liver disease data set and the other one is the rice sheath blight data set. When dealing with the rat liver disease data set, SVM is adopted as the classification technique, and an accuracy rate of97.83%was obtained based on the selected feature subset. The experiment result has proved the efficiency and the feasibility of our method.Finally, this paper proposes a weighted clustering combination method. The experiment results in the rat liver disease data set shows the superiority of the original simple voting methods.
Keywords/Search Tags:Metabolomic, Time Series, Clustering, SVM
PDF Full Text Request
Related items