Font Size: a A A

Study On Clustering Methods For Single Cell RNA-seq Data

Posted on:2020-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y YangFull Text:PDF
GTID:2370330575481225Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Single-cell RNA sequencing(scRNA-seq)allows biologists to collect large amounts of RNA-seq data detailing a single cell transcriptome,and unsupervised clustering is important for the analysis of these data because it can be used to identify the assumed cell types.Defining cell types through unsupervised clustering based on transcriptome similarity has become one of the most powerful applications of single-cell RNA sequence.In a broad sense,the purpose of unsupervised clustering is to find the natural grouping of a group of objects.Defining cell types on the basis of transcriptome is attractive because unsupervised clustering provides a data-driven,consistent and unbiased approach.Based on this idea,some gene sequencing projects were created,which aim to establish a comprehensive reference for all cell types of organisms or tissues at different stages of development.Unsupervised cell clustering is one of the key computational challenges in order to make cell mapping practical.Many scRNA-seq data sets are very large,reaching hundreds of thousands of cells,which presents challenges and opportunities.The single-cell RNA-seq expression dataset is the most complex dataset encountered in genomics.Even the smallest single-cell RNA-seq experiment will sample hundreds of cells and measure the expression level of over 10,000 genes in each cell.Large data sets not only ensure high accuracy of analysis,but also improve the ability to detect rare cell types.The efficiency and accuracy of clustering has become a big challenge in data analysis.Cell clustering will be one of the key computational challenges in order to make cell mapping practical.In order to obtain more accurate clustering results on the scRNA-seq data set and facilitate the further analysis of biological data by researchers,we studied the clustering method of single-cell RNA-seq data.Since dimensionality reduction can reduce noise,reduce lowdimensional manifolds and speed up data processing,we try to reduce dimensionality of data before clustering.The Louvain algorithm is a community discovery algorithm used to study graph data,which is considered to be the best community discovery algorithm in capability.Based on the characteristics of scRNA-seq data,we combined KNN's nearest neighbor idea with Louvain to make it better applied to scRNA-seq data.Two large scRNA-seq datasets were divided into four experiments and clustering results were quantitatively analyzed respectively and t-SNE method was used for visualization analysis.The result shows that the performance of Louvain algorithm in clustering accuracy is satisfactory.We also analyzed the other two clustering methods and found that hierarchical clustering performs well in distinguishing large samples.In addition,there is a big gap between the clustering results of the original data without dimensionality reduction and the clustering results after dimensionality reduction,which also verifies the necessity of dimensionality reduction in the scrna-seq data clustering process.
Keywords/Search Tags:Single Cell RNA-seq, Dimensionality Reduction, Cluster, Louvain
PDF Full Text Request
Related items