Font Size: a A A

Research On Joint Constraint Non-negative Matrix Factorization Method And Its Application In Omics Data

Posted on:2021-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y J HaoFull Text:PDF
GTID:2430330605458471Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cancer is one of the major diseases that seriously affect human health.With the development of gene chip technology and second-generation sequencing technology,a large amount of cancer omics data has been generated.Those data are often characterized by large numbers,rapid growth,high value,and high-dimensional small samples.Matrix factorization,as an effective dimensionality reduction technique,has been widely used in bioinformatics.Common matrix factorization techniques include Principal Component Analysis?PCA?,Vector Quantization?VQ?and Nonnegative Matrix Factorization?NMF?.With the continuous research,the existing models can't meet the growing needs of omics data mining.Therefore,based on omics data in The Cancer Genome Atlas?TCGA?,this paper improves the performance of the method by improving the existing NMF-related methods.Provide a certain reference value for the prevention,diagnosis and treatment of cancer.The specific research is divided into the following parts:?1?A Multi-constrained Non-negative Matrix Factorization?MCNMF?method is proposed.In view of the susceptibility of the original method to unstable factorization and data noise,the MCNMF method can effectively avoid the above disadvantages while improving the performance of the method.The structural information of the original data is used to guide the matrix factorization.By retaining the structural information between the data during the matrix factorization,the coefficient matrix can be well aligned with the actual distribution of the original data.In addition,applying the L2,1-norm constraint based on NMF can enhance the robustness of the model and reduce the noise interference to a certain extent.?2?A Hyper-graph Regularized Discriminative Non-negative Matrix Factorization?HDNMF?method is proposed.The simple graph regular NMF does not have the discrimination function,and the simple graph regular cannot accurately reflect the higher-order geometric information structure between the data.In order to solve the above problems,the HDNMF method captures higher-order geometric structures between data by constructing a hyper-graph instead of a simple graph.The introduction of label information makes the method discriminative.?3?An integrated Robust Graph Regularization Non-negative Matrix Factorization?iRGNMF?method is proposed.With the generation of"multi-view omics data",the analysis technology of multi-view omics data for the same cancer has developed rapidly.To deal with the shortcomings of the original NMF that cannot handle multi-view omics data.Firstly,graph regularity is introduced into the NMF method to capture the geometric structure between the data;secondly,L2,1-norm is introduced to improve the robustness of the method;finally,Extend it into an integrated model to better handle multi-view omics data.?4?A Sparsely Constrained Deep Semi-negative Matrix Factorization?SCDSMF?method is proposed.For most NMF-based models with a single-layer structure,they may show poor results for complex data,while deep learning and its well-designed hierarchical structure show significant advantages in learning data functions.On the one hand,a method called Deep Semi-Nonnegative Matrix Factorization?Deep Semi-NMF?is applied to the integrated omics data;on the other hand,the L1-norm penalty is applied to basic matrix and coefficient matrix for each layer.Therefore,Nesterov's accelerated gradient algorithm is used to accelerate the calculation process with a stepwise iterative convergence speed,and then the computational complexity of the method is discussed to prove its efficiency.Various experiments show that the method in this paper has more advantages than the existing similar methods,and can achieve better clustering,classification effects,and find more key genes related to cancer.
Keywords/Search Tags:Non-negative Matrix Factorization, Hyper-graph Regularization, Graph Regularization, L2,1-norm, L1-norm, The Cancer Genome Atlas(TCGA)
PDF Full Text Request
Related items