With the continuous development of biological big data research,more and more gene sequencing data have been generated.These data have been used to deeply understand the pathogenesis of major diseases such as cancer,providing new ideas for the diagnosis and treatment of cancer.Studying the relationship between genes and cancers can improve the existing genetic theoretical system.However,it is not easy to find pathogenic feature genes from tens of thousands of genes,which is a typical problem of high-dimensional small samples.At present,there are many classical matrix algorithms used to analyze high-order and multi-dimensional data.But matrix-based algorithms have an obvious disadvantage that they can only handle data of different modes separately,which may destroy the inherent spatial structure of data,so they cannot effectively find multi-view data information.Tensor decomposition technology is gaining significant attention as an effective means of dealing with multi-dimensional and large-scale data.This technology can preserve the information of different modes in multi-view and overcome the deficiency of matrix effectively.Therefore,the data analysis model based on tensor decomposition has gradually become one of the research hotspots.The purpose of this dissertation is to extract relevant feature genes from tumor data information and try to find new tumor subtypes.Therefore,four reliable omics data analysis models are proposed to explore the potential relationship between genes and cancers,providing reference for guiding research of biological experiments.The main research contents include:(1)A robust tensor model based on correntropy and tensor singular value decomposition is proposed to solve the problem that the outliers contained in large-scale omics data may lead to the incomplete screening of cancer feature genes.The model combines tensor singular value decomposition and correntropy regularization to analyze multi-view data.Using correntropy regularization to increase the sparsity of the sparse tensor can fully utilize the important structural information of the tensor data.At the same time,as a nonlinear local similarity measure,correntropy regularization makes the weight of outliers smaller in the optimization process,so it can effectively suppress outliers.In addition,through tensor singular value decomposition,the internal spatial structure of the tensor data can be avoided from being destroyed,and important information can be retained in low-rank component,which increases the clustering effect.(2)Aiming at the problem that the traditional tensor decomposition method cannot fully extract the low-rank structure,a tensor decomposition model based on improved low-rank representation and sparse structure is proposed.The model decomposes the original space to obtain sparse space and low-rank space.The L2,1-norm with good sparse effect is applied to the sparse space to obtain the feature tensor.Because L2,1-norm increases row sparsity,the efficiency of selecting more representative feature genes is improved.In addition,L2,1-norm can effectively mitigate the impact of outliers and improve the robustness of the model.At the same time,the improved tensor nuclear norm is introduced into the model and interacts with L2,1-norm to make full use of the low-rank information.This can further extract the low-rank component under the singular value decomposition,so that the low-rank component can be extracted more completely.(3)Aiming at the problem that the traditional model cannot make full use of the prior knowledge of the data and the high-dimensional related information cannot be well preserved,a robust tensor model based on enhanced tensor nuclear norm and low-rank constraint is proposed.Firstly,the concept of enhanced partial sum tensor nuclear norm is defined,which greatly improves the flexibility of tensor nuclear norm and effectively avoids some errors brought by tensor nuclear norm when approaching the tensor rank.In addition,total variation regularization is introduced into the model,which enables the model to use the relationship between tensor structures to separate sparse data while paying attention to the details of tensor data features.(4)In view of the fact that existing biological data analysis models cannot better reflect the complementarity between global spatial structure information and local information,a multi-view tensor model based on tensor decomposition and strong complementarity constraint is proposed to analyze multi-view omics data.The model uses the multi-view tensor to coordinate the omics and maximize the high-dimensional spatial relationship,which can comprehensively consider the different characteristics of different omics data.Firstly,the concept of strong complementary constraint applicable to the omics data is proposed.Secondly,strong complementary constraint is introduced in the model,and potential local information is mined using the complementarity between omics to improve the separability of different subtypes,so as to fully reflect the consistency and complementarity of omics data.The models proposed in this dissertation are applied to the multi-view cancer omics datasets.The experimental results show that these models can effectively explore the association information between genes and cancers,find the pathogenic feature genes,and have good clustering performance. |