Research On Breast Cancer Subtypes Clustering Model Based On Multi-omics Data Fusion

Posted on:2022-04-26

Degree:Master

Type:Thesis

Country:China

Candidate:S S Liu

Full Text:PDF

GTID:2504306323496374

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Breast cancer is a kind of malignant tumor with high incidence in women,it is also the second most common type of cancer in the world after lung cancer.The discovery and identification of cancer subtypes is the key to the diagnosis,prognosis and treatment of cancer,which has important practical significance for the realization of personalized precision medicine for cancer patients.With the development of high-throughput sequencing technology,a large number of cancer multi-omics data have been accumulated.Cancer multi-omics data has the characteristics of high dimension,high noise and low sample size,which brings new challenges to traditional data mining and analysis technologies.In recent years,machine learning has made unprecedented progress in processing big data,especially some deep learning algorithms has shown good performance in data mining and analysis,and has broad application prospects in the field of bioinformatics.How to integrate and analyze cancer multi-omics data by using machine learning to discover and identify cancer subtypes is a hot research direction in the field of bioinformatics.Based on the breast cancer multi-omics data provided by the TCGA database,this paper mainly studied the breast cancer subtypes clustering model based on multi-omics data fusion,and explored the methods of preprocessing and feature extraction of cancer multi omics data.The main work of this paper includes the following three aspects:1.A new clustering model of breast cancer subtypes based on Deep Belief Network is proposed.To ensure the biological relevance of the detected breast cancer subtypes,we incorporate prior biological knowledge to guide representation learning of the DBN network.The experimental results were analyzed and verified by using the silhouette coefficient,Cox log rank P-value and Kaplan-Meier survival analysis.The experimental results showed that the clustering model based on prior knowledge has better performance than the traditional clustering algorithm.2.A breast cancer subtype clustering model based on multi-dimensional genomics data fusion is proposed,multidimensional genomics data mainly includes gene expression data,mi RNA expression data and DNA methylation data.The model consists of two parts: the multi-dimensional genomics data fusion module based on autoencoder and stack autoencoder,and the clustering module based on prior knowledge;the principle of data representation and fusion based on autoencoder and stack autoencoder were introduced in detail,the clustering module based on prior knowledge was used to cluster the data;finally the related experiments were designed to verify the performance of the model.3.A breast cancer subtype clustering model based on multi omics data fusion is proposed.Multi omics data include gene expression,mi RNA expression,DNA methylation,copy number variation and clinical data;In the stage of data preprocessing and feature extraction,KPCA algorithm was used to extract and reduce the features of gene expression,mi RNA expression and DNA methylation data,and statistical analysis algorithm was used to extract the features of copy number variation,because there are few features of clinical data,the remaining features are directly retained as the features of clinical data after the data cleaning;the principle of construction of data core and the method of multi-core fusion were introduced in detail;finally,the cluster analysis of the data after data fusion was carried out by using the cluster module based on prior knowledge.The performance of the model is verified by the design of relevant experiments.

Keywords/Search Tags:

Breast cancer subtypes, TCGA database, Multi-omics data, Data fusion, Clustering model

PDF Full Text Request

Related items

1	Cancer Subtypes Identification Based On Multi-omics Graph Clustering
2	Research On Analysis Of Cancer Subtypes Based On Multi-omics Data
3	Multi-omics Data Integration Analysis Method And System Based On Deep Clustering Model And Traditional Model
4	Research On Gastric Cancer Subtypes Classification Model Based On Fusion Data Of Multi-omics
5	A Study On Cancer Typing Based On Spectral Clustering Algorithm
6	Research On Breast Cancer Survival Prediction Based On Deep Learning And Omics Data Fusion
7	Research Of Prognostic Carcinoma Molecular Subtypes Based On Omics Data
8	Research On Application Of Multi-omics Data Fusion Algorithm Based On Heterogeneous Graph Neural Network In Tumor Classification
9	Intelligent Recognition Of Tumor Subtypes Based On Multi-omics Data
10	Research On Cancer Subtype Clustering Based On Stacked Autoencoder