Font Size: a A A

Research On Gene Classification And Analysis Methods Of Gene Expression Data

Posted on:2008-08-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:L J CaiFull Text:PDF
GTID:1100360242965213Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the near completion of the Human Genome Project, life science has usheredin the Post-Genome Era. In this era, the research focus has shifted from that onindividual gene to that on the functions and the dyna mics of the whole genome . Thisnew focus has given rise to a demand on the processing capability of a large quantityof biological information, and the revolutionary development of the computertechnology can meet this demand. Therefore, bioinformatics has sprung up from theintegration of studies in computational biology and the computer processing ofbiological information. Bioinformatics is the research abouthow to organize data toextract new knowled ge of biology in the context of the great development of computerscience, the Internet and various biological databases.The Gene Chip or Microarrays is a latest breakthrough of the experimentaltechniques for molecular biology. Microarrays can simultaneously analyse theexpression data of thousands of genes and thereby generate a large quantity ofavailable information. Analyzing and sorting out the data have been the bottleneck forusing this technique. This paper studies the classification of genes and the analysingmethods for genetic expression data. The research is characterized as follows:(1) This paper introd uces the development of gene classifica tion, microarrays, andcommon classification algorithms, and evaluates their performa nce throughexperiments to provide a theoretical and experimental foundation for the subsequentcha pters;(2) Gene selection is an important problem in gene chip data analysis, and thereason of gene selection lies in the fact that the number of genes is far grea ter than thesize of the sample for an experiment. Therefore this paper introduces Ant ColonyOptimization Algorithm (ACO Algorithm) into the field of gene selection, and use thevalue obtained from the correlation analysis for the gene and its class to initialize theoptimiza tion problem, thus shortening the time for searching for the optimal solution.This paper takes as the objective function the linear expression of the samplediscrimina tive ability of the subset of genes and the mean distance between genes inthe gene subset, which helps locate the key genes and simulta neously eliminates theredundancy. Not like the traditional packing algorithm of selection, the objectivefunction does not require the accuracy of all the subsets of gene, so the computationa l complexity is effectively reduced with enha nced flexibility and adaptability.(3) Independent Component Analysis (ICA) is a statistical proced ure for geneclassification. But the estima ted separation matrix algorithm in ICA mainly adoptsrandom grads algorithm, and natural grads algorithm. Those algorithms, which arebased on the descent of grads , are liable to fall into local extreme values and thusderiving inaccurate results. On the basis of genetic algorithm, this paper proposes agene classification algorithm, the funda mentalidea of which is to replace the estimatedseparation matrix algorithm in the ICA with genetic algorithm to classify the geneticexpression data, and overcome the problem of inaccuracy of the result. Experimenta lresults show that the classification proced ure prod uces better classifica tion results;(4) This paper researches into the classification of the gene expression data fromtwo aspects of the classification algorithm and the feature gene selection, andintegrates SVM algorithm and KNN algorithm into a new classifica tion algorithm forgene expression data. In light of the feature of small samples and high dimensions ofthe gene expression data, this paper proposes an improved correlation-based recursivefeature elimination algorithm (C-RFE) and successfully eliminates the redundancy indata. Experimental results show that the new procedure can effectively raise theaccuracy of classification and improve the efficiency of feature selection;(5) In view of the features of gene expression data, and the limited applicabilityand the inaccuracy of ind ividual classifier for gene classification, this paper proposes anew gene classification algorithm which is a multi-classifier combination model basedon fusion rules in neural networks, and remedies the inadequacy of individualclassifiers. Experiments show that this new procedure can improve the accuracy andthe applicability of classification;(6) Custer analysis has become an important analysing procedure for geneexpression data, but how to further analyse and explain the results for cluster analysisin terms of biological knowledge at higher levels is still a problem in functionalgenome research. This paper has proposed a simple algorithm, i.e., analysing thecluster analysis results with the help of GO and KEGG metabolic regulation pathannotation and obtained a co-expression gene set with remarkable correlation in theannotation of gene function. And then, on that basis , we have developed automaticanalysis software SigClust, and tested the predictative power of the software with agroup of gene expression data.
Keywords/Search Tags:Gene Chip or Microarrays, Gene Expression Data, Gene Classification, Ant Colony Optimization Algorithm, Genetic Algorithm, Independent Component Analysis
PDF Full Text Request
Related items