The identification of cancer subtypes is of great significance for the application of personalized medicine.It is committed to using unsupervised clustering method to divide cancer patients into different subtypes,and provide a valuable reference for subsequent treatment schemes.In recent years,with the rapid development of sequencing technology,the rich multi omics data produced by sequencing technology has brought unprecedented opportunities for the discovery of cancer subtypes at the overall level.However,due to the limitation of sequencing technology,omics data often contain a lot of noise.In addition,although the sequencing cost is declining,it is still difficult to cover a large number of samples,which makes most of the current omics data show the characteristics of high-dimensional and small samples and contain a large number of redundant features.Therefore,how to effectively integrate these heterogeneous omics data and reveal potential cancer sample categories remains a challenging task.To solve the above problems,this thesis proposes two algorithms,the specific contents are as follows:(1)A cancer subtype recognition algorithm based on multi view subspace clustering and adaptive nearest neighbor learning is proposed.The algorithm integrates low rank subspace representation learning and adaptive nearest neighbor learning into a unified framework.Firstly,a self representation matrix is generated for cancer samples in each view through low rank subspace learning.Then,based on the feature representation,the adaptive nearest neighbor learning strategy is used to obtain the global similarity matrix.In addition,the algorithm can adaptively adjust the weight of each view in the update process.Finally,an effective augmented Lagrange multiplier algorithm is designed to optimize the proposed framework.Experimental results on cancer data sets and image data sets confirm the effectiveness of this method.In addition,this thesis takes melanoma as a case study,and suggests the differences of biomolecular functions among different subtypes.(2)A cancer subtype recognition algorithm based on sparse representation and adaptive nearest neighbor learning is proposed.Firstly,the algorithm reduces the influence of noise and high-dimensional redundant features in each omics data by sparse dimensionality reduction.Then,based on the sparse representation,the algorithm uses the adaptive nearest neighbor learning strategy to obtain the global similarity view.In particular,the algorithm uses locally constrained linear coding to obtain a locally smoother sparse representation,so as to ensure that the sparse representation of samples with similar characteristics is also similar.Finally,the algorithm designs a fast and effective iterative optimization strategy to solve the variables in the objective function.Experimental results on cancer data sets demonstrate the effectiveness of this method. |