Font Size: a A A

Application Of Bayesian Integration Network In Omics Data

Posted on:2022-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:P Y DaiFull Text:PDF
GTID:2480306740488864Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Background and objective:In recent years,the application of metabolomics research in disease recognition,clinical diagnosis and prognosis has been developing rapidly,and network graphical model is one of the effective statistical methods to identify metabolite association.Due to the high computational efficiency and the ability to deal with complex structures,Bayesian network and integrated network are particularly concerned in analysis of metabolomics data.In order to provide an effective analysis method for the application of metabolomics data in disease diagnosis and prognosis research.This study discussed the application of Conditional Gaussian Bayesian Network(CGBN)in classification of omics data,and the integration of multi-source metabolomics data by integrating undirected network based on case data and simulation research,Main research contents:The first part evaluated and compared the classification ability of conditional Gaussian Bayesian networks(CGBN).Simulation data with different characteristics(different correlation coefficients,linear or nonlinear correlations,and different sparsity)were generated by simulation.The classification performance of CGBN was compared with logistic regression,partial least squares discriminant analysis(PLSDA),random forest(RF)and support vector machine(SVM).Based on the public database of breast cancer metabolomics,the ability of CGBN to identify breast cancer patients or early breast cancer patients was explored,and the strategy of identifying metabolic biomarkers for diagnosis of breast cancer was discussed.The second part evaluated and compared the accuracy of establishing integrated network based on hierarchical Bayesian method.Different network structures(band network,cluster network,scale-free network and random network)and their similar subgroup networks were simulated to discussed the accuracy of Bayesian Hierarchy Graph(BHG)and Hub-BHG(Hub-BHG)in recognizing graphical structure under different prior parameter settings.And then these two method were compared with other network modeling methods such as joint LASSO,BEAM and SSSL.The integrative network approach was applied to integrate metabolomics data from different platforms(plasma and serum)and different stages(control group,early breast cancer and late breast cancer)to explore the ability to build a network graph by integrating data from different sources and study the dynamic changes in metabolite associations.Main results:first partsimulation research: when there was a nonlinear correlation between independent variables and dependent variables,the Area Under Curve(AUC)of the CGBN was higher than that of logistic regression,PLSDA,RF and SVM.CGBN can also obtain better classification effect when there is high correlation or low sparsity among variables.Case analysis: CGBN performed showed a good classification effect on both the full data set and the early data set(AUC = 0.985,0.962).This study found that metabolites such as asparagine,glutamate and taurine could be used as potential biomarkers for early diagnosis.second partsimulation research: the BHG had the highest ability to identify the network structure of the simulated data of band network,cluster network and random network,and the AUC was 0.857,0.839 and 0.745 respectively when the sample size was 50,and 0.906,0.910 and 0.808 when the sample size was 100 respectively.When the network structure was a scale free network,the AUC of HUB-BHG method was higher than that of BHG,which was 0.795 and 0.825,respectively.The accuracy,F-score and Matthews correlation coefficient of BHG method and HUB-BHG method were higher than those of other methods.Case analysis: the similarity coefficient of the metabolite integration network between plasma and serum was 0.3778.And the integration network method also showed the difference between the two platforms.The edge number,network density and clustering coefficient of metabolite network in three stages in plasma were higher than those in serum.From the control group,the network of metabolites in plasma from early stage to late stage showed a trend of increasing connectivity and density.The connectivity and density of the plasma metabolite network increased from the control group to the late stage.Main conclusions:Bayesian network classification model based on CGBN is superior to other common classification methods in the classification of high-dimensional metabolomics data,especially in the case of small sample size.And network-based analysis can identify metabolic biomarkers related to diseases more efficiently by establishing local directed graphical networks related to diseases.The integrated undirected network graph of multi-platform and multi-group metabolomics data can effectively integrate data and improve the efficiency of network structure learning.However,due to the complexity of the network caused by the heterogeneity of multi-source data and the uncertainty of the biological relationship of metabolites,the actual performance of the method discussed in this study still needs further study.
Keywords/Search Tags:Metabolomics, Conditional Gaussian Bayesian Network, Classification Method, Integrating Gaussian Graph Network, Markov Random Field
PDF Full Text Request
Related items