Font Size: a A A

Research On Extraction Of Feature Gene Subset Based On A Hybrid Between Genetic Arithmetic And Support Vector Machines

Posted on:2008-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:T LiuFull Text:PDF
GTID:2178360245997865Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The technology of gene chip which combined life sciences and information technology has become one high effective method to explore the information of molecules. This technology also become more and more widely used within bioinformatics in the same time, its main application is to measure the activity status of thousands of genes in a single cell sample. The technology of gene chip provides fine precondition and possibility for many applications including gene diagnosis and therapy.Now the microarray technology has been widely used in translation experiments of gene expression. A gene expression result generally contains tens of thousands of genes' data, so that many traditional methods would meet efficiency problem in processing these high dimension but low sample size data. Because microarry data are characterized by a high dimension, high signal-to- noise ratio, and high correlations between genes, but with a relatively small sample size, most supervised method do not have a ideal application result.A robust gene selection approach which is based on a hybrid between genetic algorithm and support vector machines has been formalized. The major goal of this hybridization is to exploit fully their respective merits for determining the number of key feature genes and how to extract them out. In this hybridization The genetic algorithm is used as search engine to apply in the procedure of feature gene searching, and the parameters of crossovers and mutations in the procedure would be confirmed by adaptive method. The support vector machine is used as classifier to verify the accuracy of classification of the selected feature genes, then calculate the evaluations of genetic algorithm, the individual with a higher evaluation would have a lower probability to be crossovered or mutated, so the population's average evaluation would become higher by and by. The DLBCL(Defused Large B Cell Lymphoma) dataset and the prostate cancer dataset by a t-test(with a 0.05 parameter) were used to verify the performance of the hybridization in processing the high dimension microarry expression data. The hybridization with adaptive parameters has achieved a high accuracy, and has an ideal astringency. The result gene subset has important biology meaning and that could be proved by searching the biotechnology reference from the NCBI (National Center for Biotechnology Information) site.
Keywords/Search Tags:genechip, microarray, feature gene, genetic algorithm, support vector machine
PDF Full Text Request
Related items