Font Size: a A A

Research On Analysis Of Cancer Subtypes Based On Multi-omics Data

Posted on:2022-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:J FengFull Text:PDF
GTID:2544307154979379Subject:Engineering
Abstract/Summary:PDF Full Text Request
The same cancer can be composed of different subtypes.The determination and identification of cancer subtypes is the key to personalized treatment of cancer.By correctly typing cancer patients,patients with the same or similar clinical manifestations and pathological characteristics are defined into the same subtype,and corresponding treatment methods are selected for patients with different subtypes,so as to improve the cure rate of cancer.In recent years,with the rapid iterative development of highthroughput sequencing technology,a large amount of omics data has been accumulated,which provides strong support for the analysis and prediction of cancer subtypes.In this study,multi-omics data is taken as the research object to study the problems related to cancer subtype recognition:(1)This study extends and updates the existing cancer data set.This study collected cancer data published by the broad research team and constructed seven multi-omics data sets suitable for this study;(2)In this study,a multi-core clustering model based on kernel principal component analysis is proposed,which uses kernel principal component analysis to represent the original data.Firstly,the high-dimensional features of multiple cancer omics data are extracted,and then the extracted features are transformed into multiple similar kernel matrices by kernel transformation method,and weighted fused into a feature matrix that can represent cancer patients.Finally,the clustering results of different cancer subtypes are obtained by spectral clustering algorithm;(3)In this study,gene set enrichment analysis was combined with the analysis of cancer subtypes.Compared with single gene analysis,gene set enrichment analysis can find many biological processes that can not be obtained by single gene analysis.Based on gene set enrichment analysis,we found the key genes differentially expressed in different subtypes,which can be used as the fundamental basis for identifying subtypes.The results show that the multi-core clustering based on kernel principal component analysis has good adaptability,and can identify biological subtypes on multiple cancer data sets.At the same time,the clustering performance is up to 28% higher than other methods.Secondly,gene set enrichment analysis is used to associate gene sets with known functions on different subtypes.The key genes that can identify different subtypes found in these functional gene sets have more practical significance and reliability.
Keywords/Search Tags:Cancer subtype, Multi-omics data, Kernel principal component analysis, Multiple kernel clustering, Gene set enrichment analysis
PDF Full Text Request
Related items