Font Size: a A A

Research On Sparse Representation-based For Tumor Classification

Posted on:2016-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:L C SunFull Text:PDF
GTID:2404330473964822Subject:Computer technology
Abstract/Summary:PDF Full Text Request
DNA Microarray can detect the dynamic expression level of tens of thousands o f genes at the same time and these expression values constitute the gene expressio n profile data.An important characteristic of tumor microarray data is the large number of genes relative to the number of samples,but only a small number of genes are associated with tumor classification.Due to the redundant genes increase the computational complexity while they usually decrease the accuracy of the classification.So it is important to select informative genes that are used for the identification of tumors.This article mainly focuses on the analysis of tumor gene expression profile data by sparse representation based method.The main work is as follows:In the sparse representation based classification,the testing sample is coded as a sparse linear combination of all training samples and usually needs to assume that the coding residual follows Gaussian or Laplacian distribution,which may not be effective enough to describe the coding residual in practical tumor classification.Meanwhile,the sparsity constraint on the coding coefficients makes SRC 's computational cost very high.In this paper,we propose the classification mode l named molecular cancer classification using a meta-sample-based regularized robust coding method and the method is combined of meta-sample-based clustering and regularization robust coding model.First,the meta-samples are extracted fro m training set by the singular value decomposition method and the testing sample is represented as a linear combination of meta-samples.By assuming that the coding residual and the coding coefficient are respectively independent and identically distributed,the MRRCC seeks for a maximum a posterior solution of the coding problem.In this paper,an iteratively reweighted regularized robust coding algorithm is proposed to filter genes with smaller weig hts in the process of iteration in order to decrease the influence of outliers on the coding coefficient.After the iteration,reconstruct the test sample for each subclass and assign the test sample to the category with the smallest approximation error.This classification model has high classification precision and relative low time complexity.Compared with the traditional classification method,sparse representation based the classification model avo ids the problem of over fitting.However,the sparse constraint makes the computational cost very high.Studies have shown that collaborative representation but not the sparse constraint that mak es SRC power for classification and in coding coefficients,only a small percent of representation coefficients have significant values.So the new classification model named the neighbor-samples-based collaborative representation with regularized least square for tumor classification is proposed in this paper.First,we seek the k neighbor samples using the k-nearest neighbor method and code the test sample as a linear combination of neighbor samples.Finally,the testing sample will be distributed into the subclass with the least reconstructed residual.The algorithm can obtain better experiment results in contrast with several the sparse presentation based methods.
Keywords/Search Tags:Gene expression profile, Gene filter, Collaborative representation, K-nearest neighbor, Linear combination, Classification
PDF Full Text Request
Related items