Research On Gene Classification And Analysis Methods Of Gene Expression Data

Posted on:2008-08-20

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L J Cai

Full Text:PDF

GTID:1100360242965213

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the near completion of the Human Genome Project, life science has usheredin the Post-Genome Era. In this era, the research focus has shifted from that onindividual gene to that on the functions and the dyna mics of the whole genome . Thisnew focus has given rise to a demand on the processing capability of a large quantityof biological information, and the revolutionary development of the computertechnology can meet this demand. Therefore, bioinformatics has sprung up from theintegration of studies in computational biology and the computer processing ofbiological information. Bioinformatics is the research abouthow to organize data toextract new knowled ge of biology in the context of the great development of computerscience, the Internet and various biological databases.The Gene Chip or Microarrays is a latest breakthrough of the experimentaltechniques for molecular biology. Microarrays can simultaneously analyse theexpression data of thousands of genes and thereby generate a large quantity ofavailable information. Analyzing and sorting out the data have been the bottleneck forusing this technique. This paper studies the classification of genes and the analysingmethods for genetic expression data. The research is characterized as follows:(1) This paper introd uces the development of gene classifica tion, microarrays, andcommon classification algorithms, and evaluates their performa nce throughexperiments to provide a theoretical and experimental foundation for the subsequentcha pters;(2) Gene selection is an important problem in gene chip data analysis, and thereason of gene selection lies in the fact that the number of genes is far grea ter than thesize of the sample for an experiment. Therefore this paper introduces Ant ColonyOptimization Algorithm (ACO Algorithm) into the field of gene selection, and use thevalue obtained from the correlation analysis for the gene and its class to initialize theoptimiza tion problem, thus shortening the time for searching for the optimal solution.This paper takes as the objective function the linear expression of the samplediscrimina tive ability of the subset of genes and the mean distance between genes inthe gene subset, which helps locate the key genes and simulta neously eliminates theredundancy. Not like the traditional packing algorithm of selection, the objectivefunction does not require the accuracy of all the subsets of gene, so the computationa l complexity is effectively reduced with enha nced flexibility and adaptability.(3) Independent Component Analysis (ICA) is a statistical proced ure for geneclassification. But the estima ted separation matrix algorithm in ICA mainly adoptsrandom grads algorithm, and natural grads algorithm. Those algorithms, which arebased on the descent of grads , are liable to fall into local extreme values and thusderiving inaccurate results. On the basis of genetic algorithm, this paper proposes agene classification algorithm, the funda mentalidea of which is to replace the estimatedseparation matrix algorithm in the ICA with genetic algorithm to classify the geneticexpression data, and overcome the problem of inaccuracy of the result. Experimenta lresults show that the classification proced ure prod uces better classifica tion results;(4) This paper researches into the classification of the gene expression data fromtwo aspects of the classification algorithm and the feature gene selection, andintegrates SVM algorithm and KNN algorithm into a new classifica tion algorithm forgene expression data. In light of the feature of small samples and high dimensions ofthe gene expression data, this paper proposes an improved correlation-based recursivefeature elimination algorithm (C-RFE) and successfully eliminates the redundancy indata. Experimental results show that the new procedure can effectively raise theaccuracy of classification and improve the efficiency of feature selection;(5) In view of the features of gene expression data, and the limited applicabilityand the inaccuracy of ind ividual classifier for gene classification, this paper proposes anew gene classification algorithm which is a multi-classifier combination model basedon fusion rules in neural networks, and remedies the inadequacy of individualclassifiers. Experiments show that this new procedure can improve the accuracy andthe applicability of classification;(6) Custer analysis has become an important analysing procedure for geneexpression data, but how to further analyse and explain the results for cluster analysisin terms of biological knowledge at higher levels is still a problem in functionalgenome research. This paper has proposed a simple algorithm, i.e., analysing thecluster analysis results with the help of GO and KEGG metabolic regulation pathannotation and obtained a co-expression gene set with remarkable correlation in theannotation of gene function. And then, on that basis , we have developed automaticanalysis software SigClust, and tested the predictative power of the software with agroup of gene expression data.

Keywords/Search Tags:

Gene Chip or Microarrays, Gene Expression Data, Gene Classification, Ant Colony Optimization Algorithm, Genetic Algorithm, Independent Component Analysis

PDF Full Text Request

Related items

1	A Method Study Of Classification And Feature Selection Based On Gene Expression Data
2	Research On Data Mining Methods Of Gene Expression Profile
3	The Research And Realizing Of IGA-FCM Clustering Algorithm In Gene Expression Data Analysis
4	Research Of DNA Microarray Data Classification Based On SVM
5	Research On Gene Expression Data Analysis Method And Its Application
6	Research On Multi_Objective Optimization Algorithm For Biclustering In Microarry Gene Expression Data
7	Clustering Based On Genetic Algorithm For Gene Expression Data
8	Gene Clustering Algorithm Based On Data Dimensionality Reduction Framework
9	Research On 2D Spatial Gene Selection Algorithm Based On Unbalanced Gene Data
10	Research On Weighted Two - Way Clustering Algorithm Based On Gene Expression Microarray Datasets