Font Size: a A A

Research On Tensor Decomposition Method And Its Application In Biological Sequencing Data

Posted on:2021-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:2430330605963942Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cancer(malignant tumor)is the biggest killer that threatens human life and health.With the rapid development of next-generation sequencing technology,nanotechnology and biochip technology,humans have acquired a large amount of genomic information.The full mining of such information provides a theoretical basis for the prevention and treatment of cancers.There are tens of thousands of genes in biosequencing data.However,pathological changes in cells are related to differential expression of genes(differentially expressed genes).Such genes account for only a small part of massive data,which poses a challenge in extracting genes related to cancerous lesions.The robust principal component analysis method of matrices aims to represent the original matrix as a linear combination of matrices,and reflects the noise into the algorithm through sparse and low-rank constraints.Then study the partial decomposition matrix to solve the high-dimensional data problem.The spatial structure and multi-view information of multi-omics cancer data cannot be fully mined in this model,which affects the accuracy of selecting differentially expressed genes.The research based on the third-order tensor decomposition method can keep the three-dimensional structure of the data from being destroyed,and can fully mine the hidden information of the data.This has caused widespread concern at present.In order to solve the problem that the current matrix decomposition method cannot retain the spatial geometric structure of the data,I propose improvements to the sparsity and robustness of the algorithm based on the robust principal component analysis method,which using biological sequencing data from The Cancer Genome Atlas.The research is mainly divided into the following three parts:(1)Aiming at the problem of low perception of spatial geometry,a tensor principal component analysis method with robust characteristics is proposed.This method not only introduces the tensor structure,applies the L1 penalty term to the sparse term,but also uses the sparse tensor decomposed by the original tensor to preserve the spatial geometry of the data representation,so as to better process data containing outliers and noise values.The validation is performed using integrated data from multiple types of single cancer in The Cancer Genome Atlas,the method excavates differentially expressed genes with a higher degree of enrichment.(2)Aiming at the problem of low sensitivity of tensor noise,a double sparsely constrained tensor principal component model is proposed.The double sparse constraint on sparse tensor improves the accuracy of the algorithm for noise separation,and the L2,1 regular term can improve the robustness of the model.First,the gene alignment and normalization preprocessing is performed on multiomics cancer data.Secondly,the original tensor is used as the input data.After the model is processed,low-rank tensors and sparse tensors are obtained.Finally,differentially expressed genes are extracted from the sparse tensor outputted by the model.Experimental results show that the proposed method has fast solution speed,strong convergence,and can mine more common feature genes.(3)Aiming at the problem that the tensor nuclear norm cannot approximate the rank function well,a principal component analysis method based on tensor truncated nuclear norm is proposed.This method introduces truncated nuclear norm to approximate the rank function,and solves the problem of large errors in the process of the tensor kernel norm approximating the rank function,thereby improving the robustness of the model.At the same time,the model uses the L2,1 norm to learn the sparse tensor,and the row sparse constraint can better detect the outliers of the actual tensor,thereby generating a sparse group to make the sparse effect better.The new method can identify differentially expressed genes by sparse tensor,and classify samples by low-rank tensor.The experimental results of simulation data and cancer genomic data show that the proposed method is superior to other methods.
Keywords/Search Tags:Tensor principal component analysis, Truncated nuclear norm, Feature selection, Biological sequencing data
PDF Full Text Request
Related items