Font Size: a A A

Algorithms On Identification Of Disease-Associated Characteristics Based On Microbial Meta-Omic Data

Posted on:2024-03-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Q LiuFull Text:PDF
GTID:1524306923457794Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Trillions of microbes inhabit the human body and are distributed in various body sites,such as the skin,mouth,and intestines.These microbes participate in various physiological activities of the human body through multiple mechanisms,such as regulating the human immune system and affecting metabolism and nutrient absorption.Currently,the human microbiome has been demonstrated to be closely related to the occurrence and development of human diseases.Identifying disease-associated microbial characteristics is of great significance in early diagnosis,treatment,and prognosis of diseases.These characteristics include various factors that contribute to the detection and understanding of disease occurrence or progression,such as microbial species,genes,and metabolic pathways.Although many studies have identified a variety of disease-associated characteristics,theconclusions from different studies are quite different.Given the complex pathogenic mechanisms of diseases,the conclusions cannot be directly used in clinical practice.The traditional experimental methods not only consume a lot of manpower,and physical and financial resources,but also are easily disturbed by environmental conditions and experimental technologies.Therefore,how to efficiently and accurately identify diseaseassociated microbial characteristics is urgent to be studied.Advances in sequencing technologies provide an unprecedented opportunity to identify disease-associated microbial characteristics.Based on large amounts of microbial data,the use of effective computational models for disease-associated microbial characteristics identification can significantly shorten the study period and reduce experimental costs.At present,various computational methods have been proposed aiming at the identification of microbial characteristics associated with diseases,including methods based on statistical models,methods based on matrix decomposition,methods based on network models,and methods based on machine learning.Although these methods have achieved much success,the identification of disease-related microbial characteristics still suffers from many challenges.Firstly,most methods use 16S ribosomal RNA or metagenomic sequencing data from tissue swabs or fecal samples to study the composition of microbial communities.These data are difficult to explore functional activities within microbial communities and cannot reflect disease-associated microbial features in internal organs.Secondly,existing methods rely heavily on disease-related metadata of samples and compare differences between disease and healthy cohorts for disease-associated microbial characteristics.However,the metadata of majority of samples are not perfect currently,limiting application power of these existing methods.Moreover,most methods treat individual microbial features,such as species or genes,as independent entities,ignoring complex interactions within microbial communities.These interactions are important to maintain the composition and functional balance within microbial systems,and are closely related to the health and disease status of the host.These challenges have inspired us to design efficient and robust methods for disease-associated microbial characteristics identification.To address these challenges,the main work of this dissertation is as follows.Firstly,we proposed a MIcrobial Cancer-association Analysis using a Heterogeneous graph transformer(MICAH)to identify intratumoral microbial communities associated with cancer.We constructed a heterogeneous graph to represent the phylogenetic and metabolic relationships among microbial species,as well as the relationships between microbes and their hosts.Based on this heterogeneous graph representation,node features were updated using a node-type and edge-type dependent attention mechanism.The correlations between microbial communities and sample phenotypes were revealed by optimizing sample classification results.Finally,we identified cancer-associated microbial communities for each cancer type by selecting communities consisting of statistically significant species with high attention scores.Different from existing methods,MICAH uses microbial data isolated from sequencing data of host tissues,allowing it to identify cancer-associated characteristics in internal organs and shed light on how microbes interact with the host in the tumor microenvironment.Furthermore,the interactions within microbial communities are included in a heterogeneous graph representation,which makes up for the deficiency of existing methods in studying interactions.Additionally,to better reflect the heterogeneity of the graph,different feature representation spaces are allocated for different types of nodes and edges in the process of model optimization.This helps to capture specific information transmitted by different relations.We applied MICAH to a dataset of five cancer types,and found that it could identify cancer-related microbial characteristics more accurately than existing methods.Secondly,we designed a heuristic algorithm,ID AM,based on a graph optimization model,to Identify Disease-Associated gene Modules.This algorithm identified diseaseassociated gene modules by detecting local low-rank submatrices in the expression matrix,and meanwhile,maximizing the number of connected components between these submatrices and uber-operon structures.Different from the existing methods,the algorithm utilizes paired metagenomic and metatranscriptomic data to assess active functions within microbial communities,enabling comprehensive mining of data information and understanding of the function activities performed by multiple microbes.Additionally,.IDAM utilizes heuristic ideas to integrate the conservation of gene context conservation and regulation mechanism for the cross-species functional mechanism,which can reduce false positives and obtain functionally meaningful results in the inference of microbial characteristics.Furthermore,the algorithm focuses more on data features rather than relying on prior metadata of samples.This allows it to serve as a complement to previous methodsthat rely on sample disease labels to identify microbial characteristics and reduce the bias from misleading metadata,while the absence of high-quality metadata.We applied ID AM to publicly available datasets from inflammatory bowel disease,melanoma,type 1 diabetes mellitus,and irritable bowel syndrome and demonstrated the superior performance of ID AM in disease-associated characteristics inference compared to existing popular tools.Currently,MICAH has been implemented in Python and can be freely downloaded from https://github.com/OSU-BMBL/micah.The ID AM algorithm has been implemented in C language,and the source code is freely available at https://github.com/OSUBMBL/IDAM.
Keywords/Search Tags:The algorithms in bioinformatics, Microbiome, Complex diseases, Heterogeneous graph attention network, Optimization algorithm
PDF Full Text Request
Related items