Font Size: a A A

Research On Sparse Low-rank Representation Model And Its Application In Cancer Sequencing Data

Posted on:2021-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:C H LuFull Text:PDF
GTID:2434330605460020Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Since the 21 st century,cancer has become one of major disease endangering the safety of human life,and it is the primary research object to solve the problem of human health.With the rapid development of sequencing technology,cancer sequencing data have been produced continuously,which provides rich data resources for cancer research and promotes the development of bioinformatics research.However,the characteristics of cancer sequencing data,such as high-dimensional small samples,high redundancy and noise,pose challenges to data mining.Accurate and reliable identification of cancer types and selection of key pathogenic genes are of great significance for disease diagnosis and treatment.Sparse low rank representation model is an effective model for subspace segmentation of high-dimensional data,which has been successfully applied in many fields,and provides an effective way for cancer sequencing data analysis.Aiming at the characteristics of cancer sequencing data and taking sparse low rank representation model as the starting point,this thesis proposed four new methods for research and exploration cancer sequencing data on cancer sample clustering and feature selection.The main research contents are as follows:(1)The first method is graph regularized low rank representation method under sparse and symmetric constraints: This method introduces graph regularization constraint and symmetric constraint to the sparse low rank representation model.The graph regularization constraint preserves the local geometric structure of the data.The symmetric constraint reduces the impact of data noise on data structure.The similarity matrix is constructed by the angular information of its principal directions of sparse low rank representation matrix under symmetric constraints,and multiple cancer samples are clustered by spectral clustering method.(2)The second method is nonnegative sparse low rank representation optimization model: This optimization model strictly limits the representation matrix by nonnegative constraint.And,it takes the obtained representation matrix as the evaluation weight of the importance of cancer genes to score the cancer genes by the score function,and selects the characteristic genes.This method can reduce the dimension of high-dimensional and small sample cancer data by selecting a feature subset.The selected feature genes have high recognition in distinguishing different cancer samples.(3)The third method is robust hypergraph regularized weighted sparse low rank representation method: This method uses the maximum likelihood function to solve the high noise problem of cancer sequencing data.At the same time,it uses the weight information of the data sample pairs to optimize the sparse low rank representation matrix,and employs the hypergraph regularization constraint to explore the higher-order geometric structure information of the data.This method has achieved good results in the samples clustering of cancer sequencing data.(4)The fourth method is graph regularized compact sparse low rank representation method on multi-omics data: This method updates the data dictionary by linear modeling cancer sequencing data,and utilizes the richness and diversity of cancer multi-omics data information to process all kinds of data information cooperatively.It can achieve the fusion of cancer multi-omics data information.The obtained the sparse low rank representation matrix contains the integrated biological information to improve the clustering effect of cancer samples.The innovation of this thesis is mainly to optimize the method based on the sparse low rank representation model and the characteristics of cancer sequencing data,the four methods are proposed: graph regularized low-rank representation method under sparse and symmetric constraints,nonnegative sparse low rank representation optimization model,robust hypergraph regularized weighted sparse low rank representation method,and graph regularized compact sparse low rank representation method on multi-omics data.The methods are applied to the mining and processing of cancer sequencing data.The experiment results on The Cancer Genome Atlas data prove that the methods proposed in this thesis are feasible for sample clustering and feature selection of cancer sequencing data.
Keywords/Search Tags:Sparse low-rank representation, Cancer sequencing data, Cancer sample clustering, Feature selection, Graph regularization
PDF Full Text Request
Related items