| RNA-Seq is a high-throughput sequencing technique in biological experiments,mainly used for transcriptome sequencing analysis.The estimation of the expression level of specific isomers of genes and the estimation of the correlation between genes are important problems in transcriptomics.Based on the RNA-Seq experiment,the estimated results of these two problems could be numerated.But RNA-Seq data are often high-dimensional and overly dispersed,making it difficult to accurately measure expression levels and correlations.This paper firstly summarizes the two most widely used models of RNA-Seq data through relevant literature.The Poisson distribution-based model can effectively solve the problem of isoforms multi-source mapping,among which the typical RNA-Seq data model with uniform sampling,non-uniform sampling and multi-source RNA-Seq data model.But most Poisson models are based on uniform sampling of sequencing fragments,which is inconsistent with practice.However,the model based on correcting the deviation of RNA-Seq data does not need this premise.It takes the deviation generated by the RNA-Seq experiment into the model,so the correcting deviation model can better fit the real data.This paper introduces three kinds of correcting deviation models,Hammer,WemIQ and NLDMseq.Secondly,for correlation estimation between genes,there are mainly three methods for correlation estimation between RNA-Seq data.PCAN estimates the correlation between two non-normal data sets using the correlation estimated at the level of Poisson model natural parameters.PLNseq models the data using a multivariate Poisson lognormal distribution,makes inferences using likelihood methods,and proposes a three-stage numerical algorithm to estimate unknown parameters.The multivariate Poisson lognormal model and the moment method were used to estimate the correlation coefficients for the three possible mRNA conditions of genes.On this basis,the zero truncated Poisson lognormal model is established to estimate the accurate correlation coefficients for the cases of a large number of 0 reads leading to gene correlation errors.Finally,the data model and correlation method are discussed in detail,and the advantages and limitations are expounded. |