Modeling And Correlation Estimation Of RNA-Seq Data

Posted on:2024-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Yin

Full Text:PDF

GTID:2530306929990729

Subject:Probability theory and mathematical statistics

Abstract/Summary:

PDF Full Text Request

RNA-Seq is a high-throughput sequencing technique in biological experiments,mainly used for transcriptome sequencing analysis.The estimation of the expression level of specific isomers of genes and the estimation of the correlation between genes are important problems in transcriptomics.Based on the RNA-Seq experiment,the estimated results of these two problems could be numerated.But RNA-Seq data are often high-dimensional and overly dispersed,making it difficult to accurately measure expression levels and correlations.This paper firstly summarizes the two most widely used models of RNA-Seq data through relevant literature.The Poisson distribution-based model can effectively solve the problem of isoforms multi-source mapping,among which the typical RNA-Seq data model with uniform sampling,non-uniform sampling and multi-source RNA-Seq data model.But most Poisson models are based on uniform sampling of sequencing fragments,which is inconsistent with practice.However,the model based on correcting the deviation of RNA-Seq data does not need this premise.It takes the deviation generated by the RNA-Seq experiment into the model,so the correcting deviation model can better fit the real data.This paper introduces three kinds of correcting deviation models,Hammer,WemIQ and NLDMseq.Secondly,for correlation estimation between genes,there are mainly three methods for correlation estimation between RNA-Seq data.PCAN estimates the correlation between two non-normal data sets using the correlation estimated at the level of Poisson model natural parameters.PLNseq models the data using a multivariate Poisson lognormal distribution,makes inferences using likelihood methods,and proposes a three-stage numerical algorithm to estimate unknown parameters.The multivariate Poisson lognormal model and the moment method were used to estimate the correlation coefficients for the three possible mRNA conditions of genes.On this basis,the zero truncated Poisson lognormal model is established to estimate the accurate correlation coefficients for the cases of a large number of 0 reads leading to gene correlation errors.Finally,the data model and correlation method are discussed in detail,and the advantages and limitations are expounded.

Keywords/Search Tags:

RNA-Seq, Data model, Gene isoform expression, Correlation estimation, Multivariate Poisson lognormal model

PDF Full Text Request

Related items

1	Modeling the Correlation Structure of RNA Sequencing Data Using A Multivariate Poisson-Lognormal Model
2	Poisson Lognormal Integer-valued GARCH Model
3	Correlation Research Of Multivariate Failure Time Data Based On Generalized Estimating Equations
4	CUSUM Control Chart Design For Multivariate Poisson Distribution And Time Series Model
5	Population Density Estimation Based On Distance
6	The Estimation Of Two-component Poisson Mixture Model With Robust Random Effects
7	Modeling with the composite lognormal-Pareto models and the composite Weibull-Pareto models
8	Study On Statistical Process Control Chart Of Multivariate Count Data
9	The Construction Of Gene Regulation Network By Mathematical Models
10	The Summary Of Gene Expression Level Based On Primary Affymetrix Array Data