Font Size: a A A

Prediction Algorithms Based On Network Models For The Problems Associated With Genes

Posted on:2014-06-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L GuoFull Text:PDF
GTID:1260330398497856Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, and the continousimprovement of the theory and technology in various disciplines, the mutual penetrationand fusion among different principles are arising. Bioinformatics as the interdisciplineof computational molecular biology and information processing science has attractedmuch attention, in which theories and technology in information, mathematics, physicsand chemistry are applied to tackle the problems in biology. Particularly, computeraided analysis provides sound supporting. As a result, some hard computationalproblems arising from bioinformatics are presented to researchers in information field.Various computational prediction problems in bioinformatics are of great challenge,such as predicting candidate genes involved in complex diseases (i.e. disease gene),gene function prediction, microRNA targets prediction, molecule interaction predictionetc. Reliable prediction results can give suggestions and clues to biological experiments.Furthermore, it can make the costs of biological experiments in human and time bedecreased dramatically, and also accelerate the experimental processes. Algorithms formany kinds of prediction problems not only provide answers for biological problems,but also enrich the theory connotation of algorithm itself, and of great value in theorystudy and application fields.This dissertation deals with two main biological prediction problems in a globaland systematic view based on biological networks. Specifically, we investigate thefollowing problems and make the following contributions.(1) The identification of disease-causing genes is a fundamental challenge inhuman health and of great importance in improving medical care, and provides a betterunderstanding of gene functions. Recent computational approaches based on theinteractions among human proteins and disease similarities have shown their power intackling the issue. Here, a novel systematic and global method which integrates twoheterogeneous networks for prioritizing candidate disease genes is provided, based onthe observation that genes causing the same or similar diseases tend to lie close to oneanother in a network of protein-protein interactions. In this method, the associationscore function between a query disease and a candidate gene is defined as the weightedsum of all the association scores between similar diseases and neighbouring genes,moreover, the topological correlation of these two heterogeneous networks can beincorporated into the definition of the score function, and lastly an iterative algorithm isdesigned to calculate the score function. This method was tested with10-fold cross-validation tests, significantly outperforming a state-of-the-art method calledPRINCE. The method presented here was also applied to study three multi-factorialdisorders: Breast Cancer, Alzheimer Disease and Diabetes Mellitus Type2, and somesuggestions of novel causal genes and candidate disease-causing subnetworks wereprovided for further investigation.(2) A large number of long non-coding RNAs(lncRNAs) have been identified bylarge-scale analyses of full-length cDNA sequences, chromatin-state maps or otheranalyses based on RNA-seq data, which draw a widespread attention on their studybecause of their specific properties and complicated biological functions. However, thefunctions of most lncRNAs remain to be determined. There is a critical need to annotatethe functions of increasing available lncRNAs. However, functional characterization oflncRNAs is a challenging task. For this purpose, we analyze the biological properties oflncRNAs and try to select proper features for function prediction of lncRNAs. In thisdissertation, we try to apply a global network-based strategy to tackle this issue for thefirst time. We develop a bi-colored network based global function predictor, named longnoncoding RNA Global Function-Predictor (lnc-GFP), to predict probable functionsfor lncRNAs at large scale by integrating gene expression data and protein interactiondata. The performance of lnc-GFP is evaluated on both protein-coding and lncRNAgenes. Cross-validation tests on protein-coding genes with known function annotationsindicate that our method can achieve a precision up to95%with a suitable parametersetting. Among the1713lncRNAs in the bi-colored network, the1625(94.9%)lncRNAs in the maximum connected component are all functionally characterized. Theinferred putative functions for many lncRNAs by our method highly match the knownliterature. With the success of lnc-GFP in function prediction for lncRNAscharacterized in mouse bi-colored network, we integrate lnc-GFP into the web server ofncFANs, which is a first web server designed to facilitate function annotation oflncRNAs. And here ncFANs2.0(http://www.bioinfo.org/ncFANs/) as a substantialupgrade to the original web server is presented, which is dedicated to functionalannotation of lncRNAs at large scale and comprehensively.
Keywords/Search Tags:disease gene, long non-coding RNA, function prediction, network model, classification algorithm
PDF Full Text Request
Related items