Font Size: a A A

Research On Single-cell RNA Sequencing Data Analysis Method Based On LDA Model

Posted on:2022-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q YangFull Text:PDF
GTID:2480306572457204Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Single-cell RNA sequencing(scRNA-seq)can determine the transcriptome status of each cell in a sample composed of tens of thousands or hundreds of thousands of cells.By analyzing the differences in gene expression levels between cells,researchers can find cell subtypes with biological significance.This is of great significance to the research in the fields of tumor,immunity,neuroscience and so on.In single-cell sequencing experiments,the original RNA content of each cell is limited,and the specificity of gene expression between different cells is strong,so scRNA-seq data often has problems such as high noise and sparse data.The existing analysis methods lack targeted optimization when dealing with the above-mentioned problems.Low signal-to-noise ratio and high data sparsity are still huge challenges for scRNA-seq data analysis.Latent Dirichlet Allocation(LDA)is a probabilistic topic model for unsupervised learning.By introducing the hidden variable "theme" into the two-layer structure of observation sample-feature attributes,LDA model can dig out potential hidden patterns from massive data,and has obvious advantages in complex,sparse or noisy datasets.The study proposed a single-cell RNA sequencing data analysis method based on LDA model.We studied the performance and efficiency of this method on scRNA-seq datasets,and the interpretability of LDA model results from a biological perspective.The details are as follows:Based on LDA model,the study constructed a scRNA-seq data analysis process.Because LDA model has the ability to automatically mine hidden patterns in datasets,this process does not require normalization and selection of highly variable genes.First,the cell-topic relationship and the topic-gene relationship are obtained through LDA model training.Then,based on the topic probability vector of each cell,use k-medoids algorithm to divide all cells into cell clusters.Finally,according to the results of LDA model,this process annotate the cell type of each cell cluster.Based on seven ‘gold standard' datasets of human lung adenocarcinoma cell lines,the study compared the performance of LDA model with a variety of common scRNA-seq data analysis methods.In the case of comprehensive consideration of a series of evaluation indicators such as accuracy and recall rate,the result showed that the method based on LDA model has the best performance on all datasets,and LDA model achieves a good balance between under-and over-clustering.Based on two real scRNA-seq datasets,the study used LDA model to carry out a positive analysis.The first dataset is human melanoma dataset.The analysis result of this dataset showed that LDA model not only can distinguish malignant cells from different patients and stromal cells belonging to different cell types,but also can distinguish the subtypes of tumor-infiltrating T cells.In other words,LDA model can simultaneously identify cell types with large differences in number and multiple levels of functional specialization in a dataset.The second dataset is human embryonic thymus development dataset.The analysis result of this dataset that LDA model can reconstruct traditional T cell differentiation trajectory.In addition,the study proved that LDA model can accurately discover the Marker gene of cell types so as to realize annotation of cell types by analyzing two datasets.The study used the multithreading LDA model software package to realize scRNA-seq data analysis,and generated large-scale scRNA-seq simulation datasets to test the calculation time and efficiency of the study method.The test result showed that the efficiency of the multithreading LDA model is greatly improved compared with single-threaded implementation.The single-cell RNA sequencing data analysis method based on LDA model is usable in large-scale scRNA-seq datasets.
Keywords/Search Tags:single-cell sequencing, clustering analysis, Latent Dirichlet Allocation, scRNA-seq
PDF Full Text Request
Related items