Font Size: a A A

Development And Application Of RNA Interaction Text-mining Tool

Posted on:2020-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:1360330596475926Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
The RNA Interactomics is an important research area of RNomics,which mainly focuses on(1)identifying and summarizing the RNA interaction data,(2)analyzing and mining RNA interaction network.At present,various high throughput experimental techniques,prediction algorithms and databases of RNA-RNA interaction(RRI)have been developed,which have provided valuable platforms for collection and reorganization of RNA interaction data.However,numerous experimental and computational studies have expanded the number of diverse RRIs,which are usually scattered across a vast number of biological studies,and makes manual curation a near-impossible task.Hence,development of text-mining methods for RRI information extraction will be an important path to tackle this problem.Meanwhile,traditional biological experiments and techniques are unable to cope with the big data of RNA interaction network.That impels us to introduce a new research approach: mining the clues in the huge RNA interaction network using mathematical and statistical methods,and then validating them by biological experiments.In this study,we developed a text mining software platform,RIscoper(RNA Interactome Scoper),for extracting RRIs from the literature based on the N-gram model.Then,for the RRI data collected with the assistance of RIscoper,we performed deep analysis of the ncRNA-associated virus-host crosstalk network and tried to reveal the potential ncRNA-associated molecular mechanisms of viral infection.The main contents of this study are as follows:1.the algorithm flow and software of RIscoper were developed based on N-gram model: Step 1 represents sentence standardization,mainly including sentence segmentation and word lemmatization.Step 2 represents named entity recognition(identifying the RNA names in sentences).Step 3 represents sentence scoring.At first,we scored the sentences by the N-gram statistical model,and then used the Katz smooth algorithm and geometric mean algorithm to smooth and normalize the score,respectively.2.a positive RRI corpus(containing 13,377 sentences with definite RRI information)was bulit as the standard training set of RIscoper.All of the sentences in the RRI corpus have been manually curated from over 5,000 literatures in PubMed.These sentences cover diverse RRI information,including interactions among mRNA,lncRNA,miRNA,sRNA,circRNA,snoRNA,snRNA,scaRNA and scRNA.3.10-fold cross-validation was performed to evaluate the performance of the RIscoper.The results showed that RIscoper have high performance in RRI(precision: 90.4%,recall: 93.9%)and PPI(precision: 90.3%,recall: 94.1%)discovery.Moreover,the results of the case studies demonstrated that RIscoper has good practicability for RRI discovery as well.4.we integrated human protein-protein interaction(PPI)and ncRNA-protein interaction(NPI)data as well as virus PPI and NPI data(some NPI data were collected with the assistance of RIscoper)from multiple resources and analyzed the centrality of the human proteins/ ncRNAs targeted by virus ncRNA.The results showed that the human proteins/ncRNAs that targeted by viruses were primarily hubs and bottlenecks in human PPI and NPI network(targeted protein:Wilcoxon's Rank-Sum Test,degree: P = 1.99E-11,betweenness: P = 9.32E-09;targeted ncRNA:Wilcoxon's Rank-Sum Test,degree: P < 2.2E-16,betweenness: P < 2.2E-16),revealing more frequent crosstalk between viral ncRNAs and human hub and bottleneck proteins/ncRNAs.For example,P53,as a hub and bottleneck in the human PPI network,is directly targeted by the EBV original pathogen miR-BHRF1-1 in the control of EBV late lytic replication.And BCL2 is directly targeted by ebv-miR-BHRF1-2,which could inhibit early apoptosis.5.the centrality and the functions of the proteins commonly targeted by viral ncRNAs and proteins were analyzed.The results showed that degree and betweenness centrality in shared proteins were significant higher than human proteins that were targeted only by either viral ncRNAs or proteins,and the results of functional enrichment analysis showed that human proteins shared by viral proteins and ncRNAs were significantly enriched in cell death related processes,which especially participated in different autophagy subnetworks.6.we evaluated the significance of the overlap between the targets for each pair of viral and host ncRNAs.820 ncRNA pairs were identified with significant overlap(hypergeometric test,P < 0.01),suggesting some viral and host ncRNAs have potential functional homology.7.we characterized and classified these viruses based on ncRNA-associated virus-host crosstalk network,and found 6 viral clusters using random walk with restart(RWR)algorithm.In addition,functional enrichment analysis further revealed that the viruses between different viral clusters had different functional tendencies and that the viruses within the same viral cluster shared similar functions,suggesting diverse pathogenesis in inter-clusters and homology in intra-clusters.In summary,this study focuses on the direction and demand of the RNA interactomics,and development of a RRI text-mining tool(RIscoper)for RRI discovery.It might provide data accumulation and technical support for RNomics research.In addition,we deeply analyzed the ncRNA-associated virus-host crosstalk network(collected with the assistance of RIscoper),and revealed some potential molecular mechanisms of the viral infection,providing important knowledge to help uncover the underling mechanisms of viral infection and developing novel therapeutics.
Keywords/Search Tags:RNA-RNA interaction (RRI), text-mining, N-gram model, viral infection, ncRNA-associated crosstalk network
PDF Full Text Request
Related items