Font Size: a A A

Research On Key Technologies Of Transcriptome Reconstruction And Analysis Based On Information Channel Model

Posted on:2016-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2310330503456371Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Automated isoform discovery and quantification from high-throughput RNA sequencing(RNA-Seq) data are important tasks in the research field related to the next-generation sequencing(NGS). They are useful in exploring the landscapes of genome and transcriptome, and have significant applications in the researches such as gene differential expression analysis and gene regulation mechanism study. However, the tasks are challenging because of the uncertainties arising from reconstructing full transcriptome with partial observations. RNA-Seq reads only cover part of the transcriptome and there is unpredictable information loss. Existing methods face the problems of low accuracy and annotation dependency.In this dissertation, we propose an information-theoretic model to tackle the problems. We model RNA-Seq into an information transduction system, with transcriptome and reads modeled as information source and observed signals respectively. Data uncertainties are explicitly reduced by exploiting the information transduction capacity. The details are as follows:Firstly we conduct automated assembly of gene structures and reconstruction of candidate isoforms. In annotation-independent mode, the assembly includes hierarchical stages of coarse delimitation of expressed regions, gene loci detection and modification of exon identification results. Then directed graphs are built for isoform search. Different kinds of path costs based on the graph structures are defined to conduct effective selection of candidate isoforms.Secondly, we conduct simultaneous isoform discovery and abundance estimation based on the model of maximal information transduction capacity. The procedure of RNA-Seq is modeled as a process of information transmission. Mutual information is employed to measure the dependence between isoforms and reads. We directly model and control the uncertainties caused by missing information and ambiguous read mapping. Experiment results demonstrate the advantages of our method in coping with the complex gene/isoform structures and discovering low expressed isoforms.Thirdly, the algorithmic framework is extended to the annotation-dependent model and the algorithms are implemented into an open source transcriptomeanalysis software for public usage. Given annotations, we compare and combine the assembled gene/isoforms with the annotated ones to identify novel gene loci and splicing structures. The software program can flexibly accommodate different modes and applications.
Keywords/Search Tags:isoform discovery, abundance estimation, informationtransduction capacity, automated assembly, transcriptome analysis software
PDF Full Text Request
Related items