Font Size: a A A

Deconvolution Study Of Tumor Composition Using Partially Available DNA Methylation Data

Posted on:2024-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:D Q HeFull Text:PDF
GTID:2544307139956109Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In cancer treatment,the nature and degree of cellular infiltration have a significant impact on tumor response and prognosis,making the proportion of cells in tumor tissue an essential consideration in diagnosis and treatment strategy development.To better understand the composition of tumor cells,researchers in the field of bioinformatics have worked to develop computational methods to analyze intra-tumor heterogeneity,which can effectively replace expensive and timeconsuming experimental methods and provide a faster and more accurate analysis of tumor cell composition.Currently,these methods are divided into reference-based and non-reference-based methods.Although reference-based methods can obtain more accurate cell proportions,they require complete reference data as the input to the method,however,in practice,it is difficult to obtain complete reference data,so only partial data are available,which makes the existing reference-based methods unable to obtain satisfactory performance.For the reference-free method,it does not require reference data as input,but its accuracy of estimating cell type proportions is low.Therefore,it is necessary to develop a more accurate model based on partial reference data to predict the cellular composition in tumor tissues.DNA methylation is an epigenetic modification widely found in eukaryotic genomes that regulates gene expression and cell differentiation by attaching methyl groups to DNA.In tumor tissues,DNA methylation patterns are frequently altered,and this may lead to dysregulated gene expression and disturbed cell differentiation,which in turn promotes the formation and growth of cancer cells.By analyzing DNA methylation patterns in tumor tissues,information can be obtained about the proportion of cell types in that tissue.In addition,DNA methylation can be used for tumor staging and prognosis prediction.By comparing the DNA methylation patterns of normal tissue with tumor tissue,the type and subtype of the tumor can be determined.Moreover,specific patterns of DNA methylation can also be correlated with patient prognosis,which can help physicians predict patient survival and treatment response.These suggest that DNA methylation is important and useful for tumor research.In this paper,we present a model for cell type decomposition using partially available DNA methylation data(PRMeth).The PRMeth model uses an iterative optimized non-negative matrix decomposition framework to take as input DNA methylation data for a fraction of cell types in easily available tissue mixtures(including blood and solid tumors),while predicting the methylation profile of unknown cell types and proportions of all cell types.Specifically,the research methodology in this paper consists of the following.(1)Construction of simulation dataset,Zhang dataset,whole blood dataset and TCGA dataset.Firstly,we obtained IDAT files of some cells on GEO database,and then used Ch AMP process to load,filter,normalize and eliminate batch effects on these files to obtain the methylation profiles of cell types,and then used β distribution and Dirichlet distribution to obtain the ratio matrix of cell types and methylation profiles of tumor mixtures.Secondly,we collected the Zhang dataset and the whole blood dataset in the papers of some researchers.Finally,we collected methylation profiles of skin melanoma(SKCM),breast invasive carcinoma(BRCA),acute myeloid leukemia(LAML)and thymic carcinoma(THYM)on the TCGA database and obtained methylation profiles of seven immune cell types on the GEO database.(2)A model for cell type decomposition using partially available DNA methylation data is proposed.To overcome the shortcomings of reference-based and reference-free methods,we propose a non-negative matrix decomposition framework(PRMeth)using iterative optimization that takes as input DNA methylation data for a fraction of the cell types in a tissue mixture that is readily available,while predicting the methylation profile of unknown cell types and the proportion of all cell types.We also analyzed three methods serving the PRMeth model,including a method to initialize the feature matrix(RPMM),a method to select feature loci(coefficient of variation),and a method to determine the number of cell types(λ_BIC).Also,we tested the λ_BIC method on a simulated data set,comparing the RPMM method with five different methods,comparing the PRMeth method with five different methods in a study as well as testing the computational performance of PRMeth.(3)We validated PRMeth on the Zhang dataset and the whole blood dataset from different aspects as well as completed the pan-cancer analysis on the TCGA dataset.We compared the PRMeth method with five different methods from different aspects on the Zhang dataset and the whole blood dataset,and also did the immune infiltration pattern analysis,tumor heterogeneity degree analysis and survival analysis on the TCGA dataset.In this paper,we systematically describe the collection process of simulated dataset,Zhang dataset,whole blood dataset and TCGA dataset,which provide valuable data resources for studying DNA methylation in the field of bioinformatics.Based on the simulated dataset,Zhang dataset,and whole blood dataset,we compared PRMeth with five different methods and showed that PRMeth can effectively infer the proportions of all cell types and recover the methylation profiles of unknown cell types.Then,applying PRMeth to four tumor types from The Cancer Genome Atlas(TCGA)database,we found that the immune cell proportions estimated by PRMeth were generally consistent with previous studies and consistent with biological significance.Our method can overcome the difficulty of obtaining complete DNA methylation reference data and obtain satisfactory deconvolution accuracy,which will facilitate the exploration of new directions in cancer immunotherapy.PRMeth is implemented in R and is freely available from Git Hub(https://github.com/hedingqin/PRMeth).
Keywords/Search Tags:cell type proportions, tumor heterogeneity, DNA methylation data, non-negative matrix factorization, immunotherapy
PDF Full Text Request
Related items