Font Size: a A A

Cancer Subtype Clustering Analysis Based On Sparse Reduced-rank Regression Method

Posted on:2018-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:S G GeFull Text:PDF
GTID:2334330515983878Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Nowadays,cancer is one of the important reasons for human death.With the development of the second sequencing technology,researchers established some large-scale projects of cancer genome sequencing(e.g.,TCGA)and acquired a large number of different types of biological data(DNA methylation,mRNA expression,etc.),which has a positive effect to understand the pathogenesis of cancer,search tumor subtypes and design the effective drugs for the treatment of cancer.However,how to fully integrate and use multiple biological omics data to design clustering algorithms of tumor subtypes,has emerged as one of the hot topics in the study of bioinformatics.Now,conventional clustering method for tumor subtypes identification is still semi-supervised or unsupervised assignments based on a single genome data.These methods have some disadvantages that correlated data types can not be used in a clustering analysis and much information have a serious loss.In recent years,a series of clustering algorithm for cancer subtypes discovering based on integrative model of multigenomic data were designed.It is need to stress that these methods are still in the early stages of development and have many problems to solve.For example,these methods must solve the problem of preselecting genes,truly integrate these biological data and get more accurate results.So,we urgently need to develop new methods of data analysis to discover cancer subtypes.In this thesis,our core idea is based on sparse reduced-rank regression(S-rrr)to find new cancer subtypes.The high dimension of multiple omics data are projected onto the low dimensional subspace contain the main biological processes,the algorithm ultimately achieve the goal of data fusion and fast clustering.We introduced background and research status of the cancer subtypes discovery in Chapter one.Chapter two outlines the data sources and integrative cluster algorithms of mutigenomic data.In the Chapter three,we used an adaptive S-rrr to optimize iCluster method.We used the S-rrr to estimate the initial value of the coefficient matrix with reduced-rank and sparse replace of the optimized principal component analysis(PCA)method.The experimental results show that our method had more effective evaluation metrics.In the Chapter four,we developed a dimension-reduction and data-integration method for indentifying cancer subtypes,named Scluster.Firstly,Scluster respectively projected the different original data into the principal subspaces by S-rrr.Then,a fused patient-by-patient network was obtained for these subgroups through a scaled exponential similarity kernel method and finally candidate cancer subtypes were identified using spectral clustering method.We demonstrated the efficiency of our Scluster method using three cancers by jointly analyzing mRNA expression,miRNA expression and DNA methylation data.The evaluation results and analyses showed that Scluster was effective for predicting survival and identified novel cancer subtypes of large-scale multi-omics data.In Chapter five,we introduced some problems in the study,summarized our work and pointed out the future development direction.
Keywords/Search Tags:Multigenomic data, Sparse reduced-rank regression, cluster algorithm, cancer subtype
PDF Full Text Request
Related items