Font Size: a A A

Pseudo-time Trajectory Inference Algorithm For Single-cell RNA-seq Data

Posted on:2023-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:T H GuanFull Text:PDF
GTID:2530306818997499Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Traditional batch sequencing methods can measure the gene expression status of entire cell populations,but cannot provide the heterogeneity information of genes in different tissue types and cell subpopulations.While advance in single-cell RNA-seq technology has enabled people to obtain gene expression information at the single-cell level quickly,efficiently,and at low cost.The large amount of single-cell RNA-seq data generated by sequencing can reflect cellular heterogeneity,reveal differences between tissues and cell subpopulations,and provide solid data support for studying cell differentiation and disease development,understanding cell fate determination and provide a new perspective on the mechanism of cell development.This paper mainly studies cell trajectory inference based on single-cell RNA-seq data,and provides theoretical insights for understanding gene function,understanding cell behavior,analyzing gene expression mechanisms,exploring cell development dynamics,and studying disease pathology.The main contents of this paper are as follows:(1)The gene interaction network entropy(GINE)index was proposed to quantify the state of cell differentiation.This method breaks the limitation of previous studies that only rely on the information of the gene itself,and takes the influence of its surrounding genes into account.Specifically,the Pearson correlation coefficient index is used to construct a gene interaction network to screen for associated genes.Meanwhile,a sample-specific gene interaction network entropy matrix is constructed with the feature of entropy quantification uncertainty level.The validation is performed on two datasets of head and neck squamous cell carcinoma and chronic myeloid leukaemia by visualization of dimensionality reduction.The results confirm the validity of the gene interaction network entropy,and the method perform well in distinguishing normal and cancer cells,and even cells in different stages.In addition,the process of describing the response of chronic granulocytic leukemia patients to drug stimuli based on the GINE index,identifying the critical period of treatment,and screening the key gene set are confirmed to be highly correlated with the disease by literature validation and gene enrichment analysis.This indicates that the GINE approach is useful to study the regulatory mechanisms of chronic granulocytic leukemia at the gene level and has a reference value for its clinical research and treatment.(2)Based on the GINE,we propose the cell trajectory inference algorithm named sc Ginet,which can construct cell trajectories without time point samples and external information,and thus resolve the regulatory role of single cell differentiation process and discover the rare intermediate state cell types.The algorithm uses cell clusters as the basis for constructing cell trajectories.The algorithm uses cell clusters as the basic unit for constructing cell trajectories to mitigate the noise inherent in single-cell RNA-seq data,and uses the Chu-Liu algorithm to construct a directed minimum spanning tree based on gene expression and differentiation state patterns to ensure the directionality of cell clusters during the construction of cell trajectories.In the part of pseudo-time assignment of cells,cell projection based on the theory of Apollonius circle avoids some of the problems of orthogonal projection.The sc Ginet algorithm is applied to human skeletal muscle myoblasts cells and mouse lung epithelial cells datasets,and the corresponding cell trajectory structures are successfully constructed,and the results confirm that they are consistent with biological processes such as cell development.In addition,sorting the cells according to developmental processes shows a decreasing trend in the GINE index of the cells,indicating their ability to measure the state of cell differentiation.In addition,the sc Ginet algorithm is validated against known gene markers,the results show that the expression patterns of all these gene markers along the cell trajectory correspond to the cell differentiation development process,confirming the reliability of the algorithm in terms of biological processes.In addition,the accuracy of the sc Ginet algorithm is verified by using extrinsic information such as cell type or cell time tag as a judgment benchmark and comparing it with several trajectory inference algorithms.The results show that the sc Ginet algorithm performs better in terms of accuracy combined on both single-cell RNA-seq datasets.
Keywords/Search Tags:single-cell RNA-seq, cell trajectory inference, directed minimum spanning tree, Chu-Liu algorithm
PDF Full Text Request
Related items