Font Size: a A A

Cardiac Cell Classification Based On Single-cell RNA-sequencing Data

Posted on:2022-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z W HuangFull Text:PDF
GTID:2480306773993109Subject:Chemistry
Abstract/Summary:PDF Full Text Request
In recent years,single-cell transcriptome sequencing technology has been increasingly used in the field of cardiology,helping to build a detailed cardiac map,explore the pathogenesis of disease,and develop targeted drugs.Cell type identification is an important part of single-cell sequencing data analysis and the basis for downstream analysis.It often faces the challenges of high-dimensional and sparse data.The mainstream method reuses marker gene calibration through unsupervised clustering,which has the disadvantages of low accuracy and inefficiency.Based on this,this paper carries out research in reducing data dimensionality and building a supervised classification model.Firstly,principal component analysis,UMAP and local linear embedding methods are used to reduce the dimensionality,obtaining key information.The visualization results show that UMAP has highest operating efficiency,which can eliminate redundant gene information and strengthen the differences between cell clusters.The overall classification effect is the best,with a micro F1 mean of 0.955,which is 3.2% higher than the traditional PCA dimensionality reduction method.Secondly,using the public gold standard data set,respectively establish logistic regression,random forest,and Light GBM cell classification models,and further use the idea of stacking ensemble learning to integrate three heterogeneous learners.Then make predictions on unknown cross-batch and cross-lab datasets to verify the generalization of the model.The experimental results show that the UMAP dimensionality reduction + Light GBM model performs best,achieving micro F1 values of 0.989 and 0.772 on the cross-batch and cross-lab test sets,respectively.In addition,the stacking model fusion can further improve the accuracy and generalization,and achieve a micro F1 value of 0.808 on the cross-lab test set,a 4.6% increase in comparison.The supervised cell classification model can realize the transfer of knowledge,provide clear and accurate cell division for unknown data sets,and save a lot of manpower and material resources.
Keywords/Search Tags:Single-cell transcriptome sequencing, Dimensionality reduction, LightGBM, Stacking
PDF Full Text Request
Related items