Font Size: a A A

Data Analysis Method Of Single Cell Transcriptome Sequencing Based On Quality Control

Posted on:2021-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:X GaoFull Text:PDF
GTID:2480306569496244Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Single cell transcriptome sequencing,as a representative technology that is booming,is of great significance for solving cell heterogeneity and discovering new cell types.The general process includes data preprocessing(including quality control,standardization,etc.),reducing data dimensions,clustering cells into different types,screening differentially expressed genes,etc.,which can learn more about the relationship between genes and cell types.In the data analysis process of single cell transcriptome sequencing,quality control is very challenging because of the high-dimensional and sparse characteristics of data.In the meanwhile,the results of quality control have a great impact on the results of subsequent analysis.Quality control removes low-quality cells,standardization makes cells comparable,feature selection selects truly biologically significant genes,dimensionality reduction reduces the dimension of sequencing data,clustering divides cells into different cell types and screens differentially expressed genes.Integrate the analysis process into one tool——the process of single-cell transcriptome sequencing data analysis.Focus on the improving quality control method based on three indicators,selecting library size,gene expression and the ratio of reads compared to spike-ins transcripts(or the ratio of compared to the mitochondrial genome)as indicators.Quality control is divided into three steps,use Spearman correlation coefficient and Pearson product difference correlation coefficient to divide cells into abnormal gene expression and normal cells,define the minimum quantile score and weighted comprehensive quality score according to the selected indicators.Evaluate the quality of the transcriptome sequencing library and identify real technical artifacts based on the data quality threshold.Apply quality control methods to real data sets for empirical analysis and compare them with common quality control methods.In the data analysis process of single cell transcriptome sequencing,we use the improved quality control method to deal the data and use the logarithmic normalization method for standardization.By comparing the existing three mature feature selection methods:HVG,DANB-based feature selection method and M3Drop,we find that M3 drop gets a better performance.In the meanwhile,we use principal component analysis to reduce the data dimension and select the first five principal components with the best effects for subsequent analysis.Perform empirical analysis on the entire data analysis process on the actual data set.According to the cell heterogeneity,7 cell subtypes are obtained by clustering the cells and the clustering results are evaluated by the Adjusted Rand Index to obtain a result of 91.9%.The accuracy of the process was verified,and the full text was summarized at the end of the article.
Keywords/Search Tags:quality control, single cell transcriptome sequencing, feature selection, cell heterogeneity, normalization
PDF Full Text Request
Related items