Font Size: a A A

Development Of A New Tool For Differential Gene Selection

Posted on:2022-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y N ZhangFull Text:PDF
GTID:2510306350496064Subject:Stem Cells and Regenerative Medicine
Abstract/Summary:PDF Full Text Request
Purpose:In recent years,next-generation sequencing(NGS)technology has developed rapidly,people are using NGS-based genomics,transcriptomics,and epigenomics technologies to find out the mystery of individual cells,making single-cell RNA sequencing(scRNA-seq)research has been promoted,allowing us to study the transcriptomes of thousands of single cells in complex multicellular organisms.In addition,more and more sensitive and automated methods are being continuously developed,aiming to provide better data on the basis of shortening time and operating costs.A large amount of data can be obtained after high-throughput sequencing,which can generate millions of reads or more at a time.After obtaining the data,how to process the data so that it can reflect the biological significance has become the focus of our attention.The process will have a great impact on downstream analysis.The data analysis process mainly includes quality control,mapping,normalization,selection of highly variable genes,and subsequent analysis.In these processes,the selection of highly variable genes has a great impact on downstream analysis,including dimensionality reduction and clustering.In the gene expression data set,the number of measurable genes in each sample generally reaches thousands or even tens of thousands,but,in fact,only a small part of the genes is biologically meaningful,we use them to distinguish different cell types.Therefore,how to choose the method of extraction and selection of highly variable genes is very important.Although researchers have developed more and more tools to select highly variable genes,there is still a lack of evaluation of the performance of various tools and general methods.The hematopoietic system participates in maintaining the normal physiological activities of the body,and is closely related to aging,diseases and even the occurrence and development of the tumors.Hematopoietic stem cell(HSC)is a type of adult stem cells that exist in the blood system and have long-term self-renewal and differentiation potential.They can differentiate into a variety of hematopoietic progenitor cells.The research on stem/progenitor cells of the hematopoietic system has important guiding significance for the research on other types of stem cells and various diseases.In addition to stem/progenitor cells,there are multiple lineages in the development of the hematopoietic system.Each lineage has multiple types of cells.Various precursor cells and mature cells perform their duties to jointly build a complete hematopoietic environment.The transcriptomes of hematopoietic stem and progenitor cells are similar,but the transcriptomes of mature cells of different lineages are quite different.A good gene selection algorithm should be applicable to various types of data.Regardless of whether the cell composition of the data is similar or diverse,it should be able to accurately select the most biologically significant highly variable genes,so as to facilitate the subsequent dimensionality reduction clustering.The purpose of this study is to use the transcriptome data of various cells in the hematopoietic system to evaluate various highly variable gene screening methods commonly used today,and on this basis,to design and develop a new tool that can select the most biologically significant highly variable genes and eliminate noise as much as possible.Content:Different indicators of different gene selection methods were compared.An R package-SIEVE was developed,which can be used in combination with other methods when doing gene selection analysis,which can better remove noise and extract highly variable genes for subsequent analysis.The performance of SIEVE was evaluated.Methods:The transcriptome data was selected from the hematopoietic cell expression profile data published by our group before;the data was preprocessed using SeuratV4;the highly variable gene analysis method was selected M3 Drop,Scmap,Scran,singleCellHaystack,Seurat and ROGUE;evaluated different methods from the purity,repeatability,and accuracy;tested the performance of SIEVE in terms of accuracy and the ratio of marker genes.Results:The performance of nine commonly used high-variable gene screening methods was compared.It is found that SinglecellHaystack is superior to other methods in terms of repeatability and accuracy.However,this method is more inclined to select genes with high expression levels.We also proposed a new strategy-SIEVE,through multiple rounds of random sampling,which minimized random noise and determined a reliable set of highly variable genes.In addition,SIEVE can retain information about genes with low expression levels,and for methods with less reproducibility,SIEVE can significantly improve the accuracy of single cell classification.
Keywords/Search Tags:Hematopoietic stem progenitor cells, HVG selection, ScRNA-seq
PDF Full Text Request
Related items