Font Size: a A A

Clustering Based On Genetic Algorithm For Gene Expression Data

Posted on:2009-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:T F LiuFull Text:PDF
GTID:2120360245499134Subject:Animal breeding and genetics and breeding
Abstract/Summary:PDF Full Text Request
In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis with the advancement in Microarray technology, it is now possible to observe the expression levels of thousands of genes simultaneously when the cells were experiencing specific conditions or undergoing specific processes.Clustering is a key step in the analysis of gene expression data. K-means algorithm is the most wide spread method in cluster analysis. We studied the effects of different measuring metrics and data preprocessing for different gene expression data on k-means clustering. The results showed that different data preprocessing ways made significant differences under different measuring metrics. The best data preprocessing in k-means clustering is to select log transformations for the time-course gene expression dataset, and measuring metrics is to select covariance metrics. However, the best data preprocessing way is log transformations for other datasets, three measuring metrics leads to better results.The vital shortcoming of K-means Algorithm is the sensibility to the initial value, it is easy to run into a local optimum. Genetic algorithm is a method of searching for best solution by imitating natural evolution, its notable features are implicit parallelism and capacity of using effective global information, so a new K-means clustering algorithm based on genetic algorithm (GKA) was proposed. It had good global and local search capability. It was the integration of k-means algorithm and genetic algorithm (IKGA). Tested by the three yeast datasets, it proved that IKGA was better than GKA. At the same time, we used IKGA to research the pig gene expression data. The results also showed that IKGA was more effective on avoiding the influence of the initial value to the clustering and reducing the value of TWCV.
Keywords/Search Tags:Gene expression data, Clustering analysis, K-means Algorithm, Genetic Algorithm
PDF Full Text Request
Related items