Font Size: a A A

The Research On Gene Expression Profile Data Mining Method Based On Sparse Representation

Posted on:2015-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:M M DengFull Text:PDF
GTID:2428330488499568Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The application of gene expression data for tumor classification is one of the research focuses in bioinformatics field.Based on gene expression profile,the research of gene expression data using modern data mining methods will reveal the mechanism of tumor which can help tumor diagnosis and specific treatment.However,the fact that the number of genes is larger than the number of samples makes many classical data mining methods unavailable for tumor classification.So a new efficient data processing method is necessary for solving the problem.Sparse representation(SR)is a new and powerful data processing method,which is inspired by the recent progress of l1-norm minimization-based methods.Sparse representation method has strong robustness and high recognition rate.The paper focuses on its application in the field of tumor classification.The main works can be summarized as follows:Aiming that the character of high dimension,high noise and high redundancy of gene expression datasets makes many classical classification methods inapplicable for tumor classification,a novel sparse representation based gene selection method is designed for reducing dimension and eliminating noise and redundancy.The proposed gene selection method can be divided into three steps.In the first step,firstly the relevance between genes and category is computed using sparse representation method,then the genes are ranked according to the relevance values and the top K genes are selected as information genes.In the second step,the paper designs a maximum similarity tree algorithm based on the sparse representation relevance measurement to cluster the information genes.In the third step,the most representative gene is selected from each gene cluster to form the final feature gene subset that is the most relevant with the classification task.The proposed gene selection method can achieve the highest performance using the least genes number.Focusing on the poor classification performance and generalization of existing classification algorithms,a novel K-SVD based sparse representation method is designed for tumor classification.This method contains two phases.In the first stage,the K-SVD algorithm is used to train the dictionary of every class training samples for eliminating noise and redundancy and thus the new training sample dictionary that can represent most sparsely testing samples is extracted.The representation and discrimination of the new training sample dictionary gets better;in the second stage,an input testing sample is represented as the linear combination of the new dictionary.Classification is achieved by using a discriminating function defined on the representation coefficients.Extensive experiments on seven publicly available gene expression data sets show that this method is efficient for tumor classification,achieving better performance than many existing representative schemes.
Keywords/Search Tags:Gene expression profile, Tumor classification, Sparse representation, Maximum similarity tree, K-SVD
PDF Full Text Request
Related items