RNA-Seq Data Analysis Based On Smoothed LDA

Posted on:2017-10-02

Degree:Master

Type:Thesis

Country:China

Candidate:S H Ou

Full Text:PDF

GTID:2310330503995784

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the high-throughout sequencing technology, RNA-Seq is becoming an important technique for transcriptome research. Compared with Microarray technology, RNA-Seq has higher signal-to-noise ratio, higher sensitivity, fewer sample requirement, etc. However, estimating expression using RNA-seq still exist some challenges, such as read multi-mapping problem and non-uniform distribution. To solve above problems, this thesis proposed a novel method, s LDASeq, to estimate gene and isoform expression. This model adds a hyper-parameter to consider the characters of sparsity between exons and isoforms. Then this model utilizes the known genome annotation to constrian the hyper-parameters and allocates the read counts according to the exon length. In addition, sLDASeq adopts a latent variable to consider the non-uniform distribution of reads along each gene. At last, we utilize a simulated dataset and several real RNA-seq datasets to validate our model. Results show that sLDASeq obtain more accurate gene and isoform expression level than other popular methods.In the RNA-seq data analysis, detecting differential gene and isoform expression is the fundamental research goal. Many methods are proposed to find differential isoform expression. However, most of these methods can only detect differential expression for each individual isoform, rather than the differential isoform usage in the same gene between two conditions. Therefore, we propose a new method to detect differential isoform usage.The method is based on sLDASeq and adopts KL divergence of the probability of the latent variables of sLDASeq to detect differential isoform usage. A simulation dataset and several real RNA-seq datasets are used to validated the performance of our method. sLDASeq can accurately estimate the ratio of isoform expression. In addition, sLDASeq combined with KL divergence can accurately detect the differential isoform usage expression in the simualtion dataset.

Keywords/Search Tags:

RNA-Seq, gene and transcript expression, smoothed LDA, exon-junction, KL Divergence, multi-mapping, non-uniformity, differential isoform usage

PDF Full Text Request

Related items

1	Transcript Expression Analysis For Multi-conditional RNA-seq Data
2	Research On Exon Recognition Algorithm Based On Multi-Mapping Optimal Window
3	A Recognition Algorithm Of Gene Exon Based On Fourier Transform And Numerical Mapping
4	Identification Of Tissue-specific Alternative Splicing Of Chinese Population Using Human Whole Genome Exon Microarray
5	Improved Trancriptome Expression Analysis For RNA-Seq Data
6	Research Of Exon-Intron Structures In The DNA Sequences
7	Interaction Between Exon-exon Sequences And Intron Sequences
8	Research On Isoform-isoform Interactions Prediction Based On Deep Multi-instance Learning
9	Clone And Identification And Of A New Isoform Of Human Itsn2 Gene
10	Cloning Of A Novel Exon Of Thyroid Hormone Receptor Î² Gene And Its Expression In E.coli