Font Size: a A A

From Pieces to Paths: Combining Disparate Information in Computational Analysis of RNA-Se

Posted on:2019-08-11Degree:Ph.DType:Dissertation
University:Purdue UniversityCandidate:Yang, YifanFull Text:PDF
GTID:1458390005994338Subject:Bioinformatics
Abstract/Summary:
As high-throughput sequencing technology has advanced in recent decades, large-scale genomic data with high-resolution have been generated for solving various problems in many fields. One of the state-of-the-art sequencing techniques is RNA sequencing, which has been widely used to study the transcriptomes of biological systems through millions of reads. The ultimate goal of RNA sequencing bioinformatics algorithms is to maximally utilize the information stored in a large amount of pieced-together reads to unveil the whole landscape of biological function at the transcriptome level.;Many bioinformatics methods and pipelines have been developed for better achieving this goal. However, one central question of RNA sequencing is the prediction uncertainty due to the short read length and the low sampling rate of underexpressed transcripts. Both conditions raise ambiguities in read mapping, transcript assembly, transcript quantification, and even the downstream analysis.;This dissertation focuses on approaches to reducing the above uncertainty by incorporating additional information, of disparate kinds, into bioinformatics models and modeling assessments. I addressed three critical issues in RNA sequencing data analysis. (1) we evaluated the performance of current de novo assembly methods and their evaluation methods using the transcript information from a third generation sequencing platform, which provides a longer sequence length but with a higher error rate than next-generation sequencing; (2) we built a Bayesian graphical model for improving transcript quantification and differentially expressed isoform identification by utilizing the shared information from biological replicates; (3) we built a joint pathway and gene selection model by incorporating pathway structures from an expert database. We conclude that the incorporation of appropriate information from extra resources enables a more reliable assessment and a higher prediction performance in RNA sequencing data analysis.
Keywords/Search Tags:RNA, Information, Data
Related items