Font Size: a A A

Research On Representation Method Based On Low Rank Constraint And Its Application In Biological Sequencing Data

Posted on:2019-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y X WangFull Text:PDF
GTID:2430330548972617Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput sequencing technologies,a large number of biological sequencing data were generated.These biological sequencing data contain a wealth of information on gene activity,effective analysis of them can access a lot of information about the regulation of gene expression,which in turn will have an important meaning for the disease prevention and treatment.These biological sequencing data usually consist of a lot of genes and a small number of samples,i.e.high-dimension-small-sample-size data.The differential expression of only a small percentage of the genes data can lead to disease in mass biological sequencing.As such,how to extract more differentially expressed genes from biological sequencing data faces an urgent challenge.In recent years,low-rank representation?LRR?has been proposed and attracted a lot of attention.This method expresses the original data matrix as a linear combination under the dictionary matrix,and hopes that the coefficient matrix is sparse.It also considers the noise in the algorithm,i.e.it decomposes the original data into low-rank block and sparse block.Then two blocks are analyzed respectively.It solves the high dimension problem.In this paper,author has studied a great deal of domestic and foreign references about low-rank representation method and applied this method for identifying differentially expressed genes successfully.And the author has proposed some new methods based existing research results.This paper proposes three differentially expressed gene selection methods:?1?The first method is based on Laplacian mapping.It introduces Laplacian map into the LRR method.As a nonlinear manifold learning method,Laplacian mapping can recover high-dimensional sampling data into low-dimensional manifold structures.At the same time,the relationship between the internal structures of the data is also considered in the algorithm during the construction process of the Laplacian matrix.It not only solves the problem of high dimensionality of data,but also makes full use of the value of the data itself.At the same time,the sparse matrix is regularized by L1-norm.It increases the robustness of noise and outliers.It provides great convenience for identifying differentially expressed genes simultaneously.?2?The second method is based on the truncated nuclear norm.The low-rank representation method hopes that the coefficient matrix under the dictionary matrix is low-rank,but the optimization problem of the rank function is NP-hard.Traditional methods usually use the nuclear norm as convex relaxation on the rank function.In recent years,the truncated kernel norm has been proposed as a new matrix norm.Comparing with the nuclear norm,the truncated nuclear norm only sums small singular values?i.e.the residual part?.In the process of minimizing the problem,the variance of the matrix is not minimized.Therefore,it does not influence the identification of principal components.The truncated nuclear norm can approximate the rank function better.It can improve the robustness of algorithm.?3?The third method is based on L2,1-norm.To increase the robustness of noise and outliers,the general method usually imposes L1-norm constraints on the sparse matrix.The method proposed in this paper imposes L2,1-norm constraints on sparse matrices.L2,1-norm can achieve row sparsity and dimension reduction simultaneously.It can improve the identify accuracy of differentially expressed gene.Through the study of this paper,it is helpful to perfect the theoretical system of low rank constraint representation method,and at the same time provide help for disease prevention and treatment.The results on The Cancer Genome Atlas?TCGA?data illustrate that the above methods are feasible for identifying differentially expressed genes.
Keywords/Search Tags:differentially expressed genes, the truncated nuclear norm, Laplacian mapping, L2,1-norm, The Cancer Genome Atlas
PDF Full Text Request
Related items