Font Size: a A A

Research On Clustering Analysis Of Cancer Subtypes Based On Genomics Data

Posted on:2017-04-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:T S XuFull Text:PDF
GTID:1224330491459955Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Identifying cancer subtypes is an important component of the personalised medicine framework, as correctly stratifying patients into subtypes will increase the chance to provide the best treatment option. With the development and application of genomic technologies, it is possible to obtain the high-throughput sequencing data for cancer cases. So it provides the right chance for researchers to study the individual differences of cancer cases, and explore the occurring, devel-opment and metastasis of cancer mechanism in the genome-wide level. However, cancer genomics data is a biological big data set that possesses multi-profile with high-dimensional features. High dimension, high noise, low biological samples are the common characteristics of these data sets. All of these present new challenges for the traditional data mining technology. Now a large number of cancer samples genomic datasets have accumulated based on the rapid development of genomic technologies. Studying the big data data mining methods to process these cancer genomics data and explore true meaning cancer subtypes with their correspond-ing tumor molecular bio-markers will be of great realistic significance for cancer research and therapy.In this paper, we mainly focus on studying the clustering analysis methods for identifying cancer subtypes and related key issues, such as cancer genomics data processing and fusion. Meanwhile, we explore the novel clustering algorithms for cancer subtypes identification.Cancer genomics is a sub-field of genomics that associates the cancer with gene based on high-throughput sequencing technologies. Gene chip and second-generation sequencing technology are the main sources of current cancer genomes data, we discuss the technical characteristics and technical details of these two tools in this paper. In addition, the largest cancer genome research project The Cancer Genome Atlas (TCGA), up to now, has been introduced in a comprehen-sive way.This paper constructs an analysis framework for cancer subtypes identifica-tion. The framework includes cancer genomics data pre-processing, feature selec-tion methods for cancer genomics data, clustering algorithms and the evaluation methods for cancer subtypes. First, we give a detailed introduction about the ge-nomics data pre-processing methods that includes data filtering, data imputation and data normalization. We present four feature selection methods. Clustering algorithm is the core content for cancer subtypes identification. In this paper, we discuss in detail about four types of cancer subtypes identification algorithms. The four algorithms are Consensus clustering, Consensus nonnegative matrix fac-torization, Integrative clustering of multiple genomic data and Similarity fusion network. For the evaluation of the identified cancer subtypes, this paper gives the evaluation criteria such as Survival analysis, Silhouette method and Statistical significance of clustering.The research of data mining clustering based on multi genomics data is a very effective approach for cancer subtypes identification and has lots of dis-coveries and applications. The new computational methods for cancer subtypes identification are still in continually developing. However, the existing methods are pure computational methods. The pure machine learning methods are unable to cope with the cancer subtypes identification problem because of the complexity of the life sciences. In this paper, we bring the gene regulatory network analysis into the fusion clustering process of multi genomics data. To this end, we present the Weighted similarity network fusion algorithm to integrate miRNA-TF-mRNA regulatory network and cancer genomics expression data. The cancer subtypes according with biological meaning can be found based on our method.
Keywords/Search Tags:Cancer, Cancer subtype, Cancer genomics, The Cancer Genome At- las(TCGA), Gene regulatory network, Data mining, Clustering analysis
PDF Full Text Request
Related items