Font Size: a A A

Association Analysis Of Gene-disease Relationships Based On Literature Mining And Gene Coexpression Network

Posted on:2018-06-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z W YangFull Text:PDF
GTID:1314330515473019Subject:Bio-IT
Abstract/Summary:PDF Full Text Request
During the past decades,the advances of high-throughput technologies and growth of research capacity resulted in large-scale biological data.One of the most important tasks in bioinformatics is the mining of molecular mechanisms that lead to diseases from massive data.In this study,PubMed abstracts and gene coexpression networks generated from microarray data are used to explore the association between diseases and genes.Two complex diseases,cancer and AIDS,are concerned.With the explosive growth of biomedical literature,acquisition of needed information is becoming increasingly difficult.Complex search requests are hard to be achieved using traditional keyword-based search engines.In order to solve this problem,this paper proposes the generalized matching principle of semantic search:the records containing the semantics of the query input should be listed in the search results.A semantic search engine,Sensehit,is implemented based on the generalized matching principle.Sensehit extracts semantics from PubMed abstracts based on natural language processing technology,integrating biomedical background knowledge from MeSH,Entrez gene,UniProt,UnitProt Keywords,Gene Ontology,HGNC,miRBase and HomoloGene.Sensehit can be used to search biomedical information such as gene regulation,protein-protein interaction,protein modification,causality and so on,facilitating the study of molecular mechanism that causes diseases.In recent years,many studies have shown that microRNAs play an important role in cancers.This study explores relationships between microRNAs and cancers from PubMed abstracts.A regular expression is used to recognized microRNA names in text.MeSH terms are used to identify cancer types involved in papers.The associations between every microRNA family and every cancer type are evaluated based on Fisher's exact test,and stored in a new database called miCancerna,with an easy-to-use web interface that can be accessed freely.miCancerna covers more than twice of papers of a similar database,miR2Disease,with precision over 90%.Significant linkage between microRNA families and cancer types are further used to construct an association network.The analysis of the network indicates that some microRNA families associated with specific cancer types may serve as targets for diagnosis and treatment,while some others involved in multiple cancer types may play a key role in tumorigenesis.The origin of HIV that causes AIDS is from SIV,which is distributed among African primates.The SIV infection of the natural host,sooty mangabeys,is benign,but causes non-natural host,rhesus macaques,progress to AIDS.By comparing the gene expression profiles between the two situations,the study explores the molecular mechanism from HIV/SIV infection to AIDS.Based on the microarray data of sooty mangabeys and rhesus macaques at different time points after infected with the same SIV strain,14 gene coexpression networks were constructed using Pearson correlation coefficient method.There were significant differences in the distribution of positive and negative connectivity between the gene coexpression networks of sooty mangabeys and rhesus macaques during the SIV infection.Pathway enrichment analysis of hub genes identifies 4 pathways enriched in sooty mangabeys,8 enriched in rhesus macaques and 3 enriched in both.Further analysis of hub genes in the gene coexpression network may lead to a better understanding of the pathogenesis of HIV/SIV infection and the development of novel interventions for the rational control of AIDS.
Keywords/Search Tags:text mining, natural language processing, semantic search, gene coexpression network, microRNA, cancer, AIDS
PDF Full Text Request
Related items