Font Size: a A A

Research On Distribution Model Of Gene Expression Difference Based On Cancer ScRNA-seq Data

Posted on:2021-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:M WuFull Text:PDF
GTID:2404330611973155Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Advances in single-cell sequencing technology have produced many valuable data,the most typical of which is single-cell RNA-sequencing(scRNA-seq)data.Analysis of these data can identify unknown cell subtypes,study intratumor heterogeneity,and screen tumor markers,thus providing a basis for the study of the development process and clinical diagnosis of cancer.Researchers have proposed a number of analytical methods for the study of scRNA-seq,including some methods that used to study the distribution of scRNA-seq gene expression data,but there is no method for studying the distribution of gene expression difference data.In this paper,we observe the overall distribution shape of the scRNA-seq gene expression difference data at each stage,and propose corresponding distribution models based on the distribution characteristics.Analysis of the parameters in each distribution model can reveal the heterogeneity in tumor cell.In addition,based on the given quantile thresholds,the distribution model that proposed in this paper can be used to identify tumor-related genes.Studying scRNA-seq data from the perspective of distribution can provide a basis for clinical research on the occurrence and development of tumors.The main work of this thesis is as follows:(1)Based on the disease stages of chronic myeloid leukemia(CML),we classify the scRNA-seq data of different patients at each stage.Then we obtain the corresponding gene expression difference data by subtracting the reference state data in each group.Analyzing the distribution characteristics of the gene expression difference data,we find these data obey a certain regularity.The left side of the distribution presents the characteristics of a sharp tail and the right side is similar to an exponential distribution.Single distribution is difficult to characterize the characteristics of asymmetric and “peak and thick tail”.This paper constructs a linear stable exponential distribution(LSED)model to describe these regularities.Then we compare the fitting density curves,the results of the goodness-of-fit test and the root mean square error(RMSE)with stable distribution and Cauchy distribution.The results indicate that the fitting effect of LSED is better than those of stable distribution and Cauchy distribution.Further,we study the parameters of LSED model,and it is found that one of the parameters show a certain trend with the development of CML.The parameter values show an increasing trend in BCR-ABL+ stem cells,while it is no significant change in BCR-ABL-.Gene set enrichment analysis(GSEA)results show that BCR-ABL+ stem cells are highly enriched in these pathways of CML-related proliferation,differentiation,apoptosis,and cell cycle compared with BCR-ABL-.The results indicate the parameter of LSED model can reveal the heterogeneity of stem cells in CML.(2)Based on the analysis of(1),we explore the distribution model of scRNA-seq data for other cancers.For the scRNA-seq data of colorectal cancer(CRC),we classify the data at different disease stages after filtering out the low expression data according to a given threshold.We obtain the corresponding gene expression difference data,and observe the distribution shape of the data at each stage.It is found that the left side of the distribution still shows the characteristics of peak and thick tail,while the right side of the distribution approximately obeys the normal distribution.In this paper,a mixed stable Normal distribution(MSND)model is constructed to fit the data at each group,and the fitted effect is compared with the stable distribution and the Cauchy distribution.The comparison of the fitting density curves,the goodness-of-fit test and RMSEs show that the fitting effect of MSND is better than those of stable distribution and Cauchy distribution.In addition,we explore the distribution model of scRNA-seq gene expression difference data based on the distribution of all expression data without filtering low expression data.The left side of the distribution still exhibits the characteristics of spikes and heavy tails,while the right side of the distribution approximately obeys the exponential distribution.Thus,a mixed stable exponential distribution(MSED)model is constructed to fit these regularities.The comparison of the fitting density curves,the goodness-of-fit test and RMSEs show that the fitting effect of MSED is better than those of stable distribution and Cauchy distribution.Further analysis reveals that the parameter of MSND model and MSED model show different trends in different cell types at different stages.GSEA results indicate that the enrichment is distinct.The results indicate that the parameters of MSND model and MSED model can reveal the heterogeneity of different cell types in the tumor.In addition,considering the given quantile thresholds,MSND and MSED can be used to identify tumor-related genes.The results of functional analysis indicate that the selected genes are highly correlated with CRC.Further analysis shows that the effect of the MSED model constructed by the unfiltering the low expression data is better than that of the MSND model that constructed by the filtering the low expression values.In addition,the fitting effect of MSED model is also better than that of LSED model in CML and CRC.
Keywords/Search Tags:chronic myeloid leukemia(CML), colorectal cancer(CRC), linear stable exponential distribution(LSED) model, mixed stable Normal distribution(MSND) model, mixed stable exponential distribution(MSED) model
PDF Full Text Request
Related items