| With the rapid development of the high-throughout sequencing technology, RNA-Seq is becoming an important technique for transcriptome research. Compared with Microarray technology, RNA-Seq has higher signal-to-noise ratio, higher sensitivity, fewer sample requirement, etc. However, estimating expression using RNA-seq still exist some challenges, such as read multi-mapping problem and non-uniform distribution. To solve above problems, this thesis proposed a novel method, s LDASeq, to estimate gene and isoform expression. This model adds a hyper-parameter to consider the characters of sparsity between exons and isoforms. Then this model utilizes the known genome annotation to constrian the hyper-parameters and allocates the read counts according to the exon length. In addition, sLDASeq adopts a latent variable to consider the non-uniform distribution of reads along each gene. At last, we utilize a simulated dataset and several real RNA-seq datasets to validate our model. Results show that sLDASeq obtain more accurate gene and isoform expression level than other popular methods.In the RNA-seq data analysis, detecting differential gene and isoform expression is the fundamental research goal. Many methods are proposed to find differential isoform expression. However, most of these methods can only detect differential expression for each individual isoform, rather than the differential isoform usage in the same gene between two conditions. Therefore, we propose a new method to detect differential isoform usage.The method is based on sLDASeq and adopts KL divergence of the probability of the latent variables of sLDASeq to detect differential isoform usage. A simulation dataset and several real RNA-seq datasets are used to validated the performance of our method. sLDASeq can accurately estimate the ratio of isoform expression. In addition, sLDASeq combined with KL divergence can accurately detect the differential isoform usage expression in the simualtion dataset. |