Font Size: a A A

Research On Breast Cancer Subtypes Clustering Model Based On Multi-omics Data Fusion

Posted on:2022-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:S S LiuFull Text:PDF
GTID:2504306323496374Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Breast cancer is a kind of malignant tumor with high incidence in women,it is also the second most common type of cancer in the world after lung cancer.The discovery and identification of cancer subtypes is the key to the diagnosis,prognosis and treatment of cancer,which has important practical significance for the realization of personalized precision medicine for cancer patients.With the development of high-throughput sequencing technology,a large number of cancer multi-omics data have been accumulated.Cancer multi-omics data has the characteristics of high dimension,high noise and low sample size,which brings new challenges to traditional data mining and analysis technologies.In recent years,machine learning has made unprecedented progress in processing big data,especially some deep learning algorithms has shown good performance in data mining and analysis,and has broad application prospects in the field of bioinformatics.How to integrate and analyze cancer multi-omics data by using machine learning to discover and identify cancer subtypes is a hot research direction in the field of bioinformatics.Based on the breast cancer multi-omics data provided by the TCGA database,this paper mainly studied the breast cancer subtypes clustering model based on multi-omics data fusion,and explored the methods of preprocessing and feature extraction of cancer multi omics data.The main work of this paper includes the following three aspects:1.A new clustering model of breast cancer subtypes based on Deep Belief Network is proposed.To ensure the biological relevance of the detected breast cancer subtypes,we incorporate prior biological knowledge to guide representation learning of the DBN network.The experimental results were analyzed and verified by using the silhouette coefficient,Cox log rank P-value and Kaplan-Meier survival analysis.The experimental results showed that the clustering model based on prior knowledge has better performance than the traditional clustering algorithm.2.A breast cancer subtype clustering model based on multi-dimensional genomics data fusion is proposed,multidimensional genomics data mainly includes gene expression data,mi RNA expression data and DNA methylation data.The model consists of two parts: the multi-dimensional genomics data fusion module based on autoencoder and stack autoencoder,and the clustering module based on prior knowledge;the principle of data representation and fusion based on autoencoder and stack autoencoder were introduced in detail,the clustering module based on prior knowledge was used to cluster the data;finally the related experiments were designed to verify the performance of the model.3.A breast cancer subtype clustering model based on multi omics data fusion is proposed.Multi omics data include gene expression,mi RNA expression,DNA methylation,copy number variation and clinical data;In the stage of data preprocessing and feature extraction,KPCA algorithm was used to extract and reduce the features of gene expression,mi RNA expression and DNA methylation data,and statistical analysis algorithm was used to extract the features of copy number variation,because there are few features of clinical data,the remaining features are directly retained as the features of clinical data after the data cleaning;the principle of construction of data core and the method of multi-core fusion were introduced in detail;finally,the cluster analysis of the data after data fusion was carried out by using the cluster module based on prior knowledge.The performance of the model is verified by the design of relevant experiments.
Keywords/Search Tags:Breast cancer subtypes, TCGA database, Multi-omics data, Data fusion, Clustering model
PDF Full Text Request
Related items