Font Size: a A A

Dimensionality Reduction And Subspace Segmentation Of Gene Expression Data

Posted on:2017-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:H J ChenFull Text:PDF
GTID:2310330512471985Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
The effect of tumor has become worse and worse for human's life and health.It has the extremely important theoretical and practical significance that studying formation and development of tumors for prevention,diagnosis and treatment of tumors.Usually,the tumor is formed by gene mutation,and DNA microarray technology provides effective protection to study cancer genes and obtain gene expression data.However,because of the unique characteristics of genetic data itself,it usually leads to "Curse of dimensionality" and results to inefficient treatments.Above all,more and more scholars have generated widespread interest in studying gene expression data,and as effective methods,the subspace segmentation methods have been successfully applied in many pattern recognition studies.In this paper,we will study the clustering task from three aspects based on gene expression data as the object and the subspace segmentation as a tool respectively.They perform the following tasks:1.Gene expression data have non-linear characteristics,so the direct use of the obtained data for pattern recognition may lose its linear and nonlinear information that the data contain.This paper introduces the shrink pattern on the basis of the least squares subspace segmentation model,in order to improve the utilization and the compactness of data while fully capturing the manifold structure of the data.Experiments on six public data sets demonstrate that,the proposed algorithm can improve the validity of clustering gene expression data and be adapted for non-linear gene expression data.2.Aim at the existing study of gene expression data,the information of the two dimensions from the sample and the feature has not been fully utilized.The objective function constructed by the F-Norm has many advantages,which is smooth and has a linear derivative function,as well as the easy calculation and simple results.We propose a clustering method that can simultaneously obtain useful information from samples and features,called Latent Least Square Regression for Subspace Segmentation.Experimental results show that the method is conductive to clustering of gene expression data,while it also can achieve better clustering results on the data with noise and missing values.3.The traditional clustering methods make it difficult to achieve the desired results on gene expression data because of characteristics of high dimensionality small sample of gene expression data.Therefore,it is particularly important that the study of projection dimension reduction technique for genetic data.On this basis,as the L1 norm focuses on the sparsity of the data and L2 norm focuses on data aggregation characteristics,we introduce the track lasso into subspace segmentation which selects norm adaptively,and then proposed Projection Correlation Adaptive Subspace Segmentation.Experimental results show that this method can find main features and the affine matrix to the subspace segmentation so as to improve the accuracy of the clustering of gene expression data.
Keywords/Search Tags:gene expression data, subspace segmentation, clustering, pattern shrunk, dimension reduction
PDF Full Text Request
Related items