| With the breakthrough of immune checkpoint theory and the discovery of two important cancer related immune checkpoints CTLA-4 and PD-1,immunotherapy has become an important method for cancer treatment.It is already established that whether a patient can respond to immunotherapy is closely related to abundance of various immune cells within the tumor microenvironment.However,the complex cellular composition of tumor microenvironment makes it difficult to know the composition of each cell type.Although techniques such as flow cytometry can be used to detect cell abundance in a sample through cell sorting,those methods are complicated and depend on availability of large amount of high-quality cells,which hinders its applicability in routine clinical and research procedures.The development of sequencing technology has brought the opportunity to use computational immune cell abundance estimation methods to replace the traditional artificial methods based on cell sorting.However,due to the complex composition of heterogeneous tissues and technical variations from different platforms,accurate and robust estimation of immune cell abundance of heterogeneous samples has become a very challenging task.To address this challenge,this paper proposes a deep learning-based immune cell abundance estimation method that can achieve accurate and robust cell type proportion prediction.The proposed method could provide auxiliary judgment for the diagnosis of diseases,such as immunotherapy,and promote the research progress of tumor immunotherapy.This paper focuses on the study of the deconvolution methods based on the estimated immune cell abundance of heterogeneous samples.To improve the prediction accuracy and robustness under the complex composition and unknown noise of the expression profile of heterogeneous samples,Chapter 3 introduces a multi-tasking learning based deep learning deconvolution method with training data creating from in silico mixing from the three platformsof microarray,high-throughput RNA-seq,single-cell RNA-seq data.Ensemble learning method is used to integrate the prediction results of multiple models to obtain the final immune cell abundance estimation.This deep learning-based method eliminates the need for selecting features in advance.Therefore,it can predict cell types lacking feature genes,which is a challenging task for traditional methods.In addition,cross-platform performance is achieved by the proposed model by combining multi-platform training data.On the nine published expression data sets from peripheral blood with ground truth cell composition,this method outperformed existing methods in both accuracy and robustness.Therefore,the proposed method enables a wide range of applications for sample cell abundance estimation in clinical.In order to further improve the accuracy of cell abundance estimation,a new data augmentation strategy is proposed in Chapter 4.The core idea of the method is to create a training data that match the distribution of expression data to be tested.Briefly,we used partial samples of the same batch as calibration samples,which are tested by flow cytometry to obtain the proportion of corresponding immune cells.The expression data of the calibration samples are further augmented by expression data of purified samples or scRNA-seq data and used for training the deep neural network for high-precision estimation of cell abundance.Experimental results showed that the performance of this method on three expression data from peripheral blood is higher than the existing cell type proportion estimation methods by a significant margin.With its accuracy and robustness,the proposed method can be used in hospital departments related to cancer treatments.It can quickly grasp the changes of immune cell components in patients with tumor lesions,provide data support for treatment and medication,has potential clinical application values and provides new directions for relevant research. |