Font Size: a A A

The Research On Feature Selection And Classification Method Using Gene Expression Profile Data

Posted on:2017-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:X M HeFull Text:PDF
GTID:2370330488476208Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Gene chip technology can quickly measure the expression of thousands of genes.With the widely application of microarray technology to cancer research,a lot of gene expression data with the characteristics of high dimension and small samples has been generated.The use of gene expression data for biological information mining is a research emphasis in the field of biological information.How to choose the gene subset with small redundancy and good classification ability from the vast amounts of gene expression profile data and dig out the useful information is a very important topic.This is very important to have a clear understanding of the significance of tumor at the genetic level and very useful to the analysis of the pathogenesis,clinical diagnosis and treatment of tumor.So further improvement is needed for the new methods to make them more adapted to the characteristics of tumor data in order to get better classification accuracy.This study is mainly includes two aspects:In terms of feature selection,Aiming that the character of high dimension,high noise and high redundancy of gene expression datasets,We propose a novel gene selection method based on sparse representation and maximum relevance and minimum redundancy.The proposed gene selection method can be divided into two steps.In the first step,the relevance between genes and category is computed using sparse representation method,then the genes are ranked according to the relevance values and the top K genes are selected as information genes.In the second step,the redundancy between genes is computed using sparse representation method,then the improved MRMR method are used to eliminate redundant genes.This method different from the previous method which treat genes as isolation one.When computer the relevance of genes,it fully considered the influence of other genes.Experimental results show that the proposed gene selection method can achieve the highest performance.In terms of classification,we propose a novel weighted meta-sample based kernel sparse representation for classification.In recent years,it has been shown that sparse representation has good ability of classification.However SCR could not well classify the data with the same direction distribution which is also existed in the gene expression data.Firstly,we extract a set of metasamples from the training samples.In singular value matrix,using matrix decomposition to get the weighted meta-sample.Secondly,a high dimensional feature space is mapped by linear kernel function.Finally,we can get the sparse representation coefficients and then obtain classification result about testing sample.The proposed method is compared with three different classification approaches.Experimental results show that the performance of the proposed approach is competitive with higher accuracy.
Keywords/Search Tags:Gene expression profile, Gene selection, Tumor classification, Sparse representation, MRMR, Weighted meta-sample
PDF Full Text Request
Related items