| High-throughput sequencing technology with rapid development provides a convenient way to study cancer biology.There are various computational models,which can extract the key information of cancer from high-throughput sequencing data.In this dissertation,we address the research and application of high-throughput sequencing data and deep learning models in cancer in two works.We explored the association between the microenvironmental features and molecular characteristics of melanoma from the transcriptome and genome data of melanoma in the first work.Melanoma is the most deadly kind of skin cancer in which cancer cells form in melanocytes.The tumor microenvironment(TME),which involves infiltration of multiple immune cells into the tumor tissues,plays an essential role in prognosis and clinical benefit to therapy.The chemokines and their receptors influence migration and functions of both tumor and immune cells.Therefore,it is important to understand the characteristics of immune infiltration and the interactions of chemokines and their receptors under immune cells.Also,molecular characteristics are associated with the efficacy of melanoma therapy.Therefore,it is very essential to explore immune characteristics and the association with molecular characteristics in melanoma.We collected the currently available 569 melanoma samples and identified TME subtypes based on the developed immune signatures,and we found that TME type D had a better prognosis among the four subtypes.Then we divided the samples into two immune cohorts based on the immune score.Next,we estimated the compositions of the immune cells of the two cohorts,and the high-immunity cohort had significantly high 16 immune cells.The 63 upregulated and 384 downregulated genes in the high-immunity cohort were enriched in immune-related biological processes,and keratin,pigmentation and epithelial cells,respectively.The correlations of chemokines and their receptors with immune cell infiltration,such as CCR5-CCL4/CCL5 and CXCR3-CXCL9/CXCL10/CXCL11/CXCL13 axis,showed that the recruitments of 11 immune cells,such as CD4T cells and CD8T cells,were modulated by chemokines and their receptors.The two driver genes,CDKN2A and PRB2,had significantly different MAFs between the high-immunity and low-immunity.We developed an approach to detect somatic SNVs and small indels from sequencing alignments based on deep clustering in the second work.Somatic mutations promote normal cells to transform to cancer.Accurate identification of somatic mutations plays an important role in precision medicine for cancer patients.However,it is hard to detect somatic mutations accurately from the sequence data due to biological and technological noises.There are three categories(statistical-based,supervised machine learning-based and deep learning-based methods)developed to detect somatic mutations.There are some limitations in these approaches.For statistical-based tools,they employ Bayesian classifiers to detect somatic mutations,which depend on the prior knowledge.As for machine learning-based methods,it is essential to identify discriminative features that require domain knowledge to derive.Supervised deep learning-based methods rely on the labeled data,which are expensive to derive,to perform feature representations.We employed the advantages of autoencoder that is designed to automatically select features and unsupervised clustering to develop an approach called DECSSV,the first deep clustering-based tool,to detect somatic SNVs and small indels from sequence alignments.We ran benchmarking experiments on real data and simulated tumors.The results showed that DECSSV achieved good F1 score compared with the four state-of-the-art somatic mutation callers.In summary,we explored the association between the microenvironmental features and molecular characteristics of melanoma in the first work.The first work will contribute to exploration of relevant biomarkers for the prognosis and treatment of melanoma,and accelerate the development of new therapeutic strategies for melanoma.Then,we developed an approach to detect somatic mutation from sequencing alignment based on deep clustering in the second work.The second work can automatically extract features and do the classification with the advantages of both the autoencoder that is designed to automatically select features and unsupervised clustering.It facilitates precision medicine for cancer patients. |