Research On Feature Selection For Gene Expression Data

Posted on:2012-06-05

Degree:Master

Type:Thesis

Country:China

Candidate:Q P Zhu

Full Text:PDF

GTID:2120330338494805

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

The gene microarray technology is a new molecular biological technology with great influence. Gene microarray makes it feasible to obtain large number of gene expression data so that people understand gene expression patterns from the molecular level and study biological phenomena in the micro perspective. But the dataset has some traits, such as small samples, high dimensionality, big noise, large number of redundant genes, uneven distribution. It is an important preprocessing technique to choose an appropriate method to reduce the feature dimension and choose the representative genes.Gene expression data is small, uneven distribution, noisy and does not meet the normal distribution. This paper proposes two estimators based on theory of robust statistics. The two statistics do not only take the information of overall sample into account, but also avoid over-dependence on the normal model assumptions. The experiments show that it obtain a better classification accuracy when these estimators are applied to the T-statistic algorithm to select differentially expressed genes.Support vector machine is a classification technology based on structural risk minimization. L-J algorithm is feature selection algorithm based on research SVM classification.According to K-L transform theory, any vector can be expressed as the sum of component in orthogonal space. Therefore, the improved algorithm use separating hyperplane of the gradient vector's components in each axis instead of the angle calculation between gradient vector and each axis.The method can obtain the same effort with L-J algorithm.Gene expression data contains a lot of redundancy genes.A large number of redundant genes affects the classification results. The paper proposed a method mapping each gene into feature space's vector based on correlation coefficients theory and cluster the vector according to certain rules.After that step, We Select a representative subset from vector composition and compose feature subset.Experiment show that the algorithm reduces the feature dimension and improve the classification results.Genetic algorithm is an intelligent search algorithm for large data sets. This paper proposes an improved genetic algorithm applied to feature selection based on full consideration to the characteristics of gene expression data.The algorithm mix genetic algorithm, immune algorithm, filtering, heuristic method and support vector machine classification. The obtained feature subset through this algorithm has stronger classification ability.

Keywords/Search Tags:

microarray gene dataset, feature selection, Robust statistic, support vector machine (SVM), Clustering, Genetic algorithm(GA)

PDF Full Text Request

Related items

1	Researches On Gene Selection Algorithm With Support Vector Machine
2	Modeling And Optimization Of Gene Microarray Data Classification Based On Intelligent Optimization
3	Research Of DNA Microarray Data Classification Based On SVM
4	The Research On Hybrid Significant Genes Selection Base On Heuristic Clustering
5	Study Of Several Optimization Algorithms For Support Vector Machine
6	Study Of GPS Height Conversion In Region-wide
7	Microrna Prediction Using SVM Based On Imbalance Dataset
8	Study Of Algorithms For Support Vector Machine
9	Prediction Of Plant MicroRNA Using Support Vector Machine
10	Research And Application Of Liuzhou Precipitation Model Based On Genetic Algorithm And Support Vector Machine