Font Size: a A A

Research Of Feature Selection And Extraction For Gene Expression Data

Posted on:2015-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:J Y LiuFull Text:PDF
GTID:2180330431989205Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the development of the human genome project, DNA MicroarrayTechnology has been widely applied to various research areas of the life sciences.Gene expression data allows people to understand gene expression patterns from themolecular level. The application of gene expression data makes cance r diagnosismore accuracy.In the process of studying gene expression data, researchers found that inhigh-dimensional data, only a small number of genes play key roles in cancerrecognition. A significant amount of redundant genes not only cause severe"dimension of disaster", but also decline the performance of cancer’s classification.Therefore, taking the appropriate methods for dimensionality reduction of geneexpression data, and selecting a combination of typical features of gene havebecome a very important work. This paper is supported by natural sciences Fund ofZhejiang Province (Y1080950) and the National Natural Science Foundation(60905034). The mainly research and findings are as follows:1. This paper proposes two methods of feature selection: the PSO-Selectionbased on Particle Swarm Optimization and the KPSO-Selection based on K-meansclustering and Particle Swarm Optimization. PSO-Selection is a filter featureselection method to determine feature quality using fitness function of the ratio ofbetween and within class distance. KPSO-Selection is a hybrid feature selectionmethod. K-means is used to cluster the genes into a fixed number of clusters, andselect clusters with better performance to form a gene pool. Furthermore, wrapperfeature selection based on PSO and ELM is used to select key genes in the gene pool.Experimental results show that more accuracy of cancer diagnosis and prediction byusing fewer genes is obtained by the two proposed feature selection methods.2. Using feature extraction method for gene expression data to reduce featurescan obtain high classification accuracy. This paper focuses on ICA feature extractionmethod, considering that the traditional ICA is an unsupervised method which means that the sample class information can’t be used. Therefore, DiscriminantFunction is applied to ICA, so that ICA becomes a supervised extraction method. Inthis paper, three different Discriminant Functions have been used, and extensiveexperiment results show that DICA has better c lassification performance thantraditional ICA.3. Feature selection and extraction methods are aiming at obtaining a subset ofthe fewest number of characters, which has a well classification and identificationperformance. For the sake of expressing the original model, Feature selectionmethod is used to select less protein in the original feature, and feature extractionmethod is utilized to select less protein in the new feature, where the new feature isobtained via the affine transformation from the o riginal feature. In addition, a subsetof features is obtained via Feature selection and extraction methods to represent theoriginal model maximize. The relationship between feature selection and extractionis studied in the paper; theoretical and experimental have been proved that the twoapproaches are equivalent in a certain extent. Simultaneously, a new method offeature selection is proposed using ICA and information gain, and the experimentresults have shown that the subset of key genes obtained by the proposed method isvery effective.
Keywords/Search Tags:Gene expression data, Feature selection, Feature extraction, Discriminant Function
PDF Full Text Request
Related items