Font Size: a A A

Research On Clustering Algorithm For Dropout Of Single Cell Data

Posted on:2021-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:J L BianFull Text:PDF
GTID:2370330614450437Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
Recently,single-cell RNA sequencing technology has been widely used.With the continuous development of technology,people can obtain a large number of single-cell gene expression data,which lays the foundation for subsequent analysis and biological research.The single-cell gene expression data contains a large amount of biological gene information,and how to analyze meaningful biological conclusions and reveal the mysteries between the genes of cells have become the focus of current research.Therefore,it is of great significance for the subseq uent analysis of single-cell RNA sequencing data.At present,cluster analysis is an important analysis method to study the data.However,due to the dropout phenomenon of data,clustering algorithm cannot be directly applied to the data,resulting in unsatisfactory results.Therefore,the purpose of this paper is to conduct clustering algorithm research on the dropout phenomenon of single cell gene expression data.This paper studies the current mainstream dimensionality reduction clustering algorithm for processing this data,and proposes an efficient and accurate clustering algorithm in accordance with the characteristics of the data.The idea of this algorithm is to combine the dimensionality reduction with the processing of dropout problem and apply it to the clustering analysis.The dimensionality reduction is taken as the pre-processing step of clustering,in the process of dimensionality reduction,the dropout problem is solved by improved distance measurement and interpolation data estimation.On this basis,this paper selects the model in accordance with the data distribution for clustering,and improves the clustering initialization algorithm,thus improving the algorithm accuracy and reducing the algorithm running time.In the experimental verification part,this paper selects 2 sets of simulated data and 5 sets of real data as the data set in the empirical analysis,and presented the results of the data set.Then select different algorithms and use a series of experiments to compare the merits and demerits of each algorithm.Compared with other algorithms,the improved algorithm obtained better results in the experimental analysis stage.Through verification,the algorithm solves the dropout problem of data to some extent,and improves the accuracy and running speed of the algorithm.Finally,this paper is summarized to provide an expandable idea for future algorithm research.
Keywords/Search Tags:single-cell RNA sequencing, cluster algorithm, dropout phenomenon, distance measurement, estimated interpolation, cluster initialization
PDF Full Text Request
Related items