A New Hybrid Genetic Gene Clustering Method

Posted on:2015-03-18

Degree:Master

Type:Thesis

Country:China

Candidate:H N Wang

Full Text:PDF

GTID:2180330467484602

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

With the development of DNA microarrays, a large number of gene expression data have been obtained. Mining some valuable information from these existing data can have great significance for exploring the function of genes and even some cellular processes. As a way of analyzing these data, clustering methods have broad applicatons in the field of bioinformatics, especially for a current research hot topic—gene clustering problem. We use clustering algorithm to find some similar genes, and thus infer lots of unknown functions of genes based on genes of the same group. K-Means algorithm is one of the well-known clustering algorithms. It uses a classical gradient descent strategy and splits data sets with an iterative relocation manner. In addition, it gives the clustering results quickly. However, it is sensitive to the initial cluster formation and is easily trapped in a suboptimal partition.Using genetic algorithms (GA) to search the entire solution space of gene clustering problem for the optimal partition can significantly improve the final results. However, the traditional crossover operator may produce empty classes, which lead to a lot of double countings. Therefore, the GA employed directly will pay a high computational cost, especially for the large-scale gene expression data, and each cluster centroid will converge slowly. Genetic k-means algorithm (GKA) maintains the GA framework and utilizes K-Means as a search operator instead of crossover for partial update. This mixed algorithm significantly remedys the defect of the slow convergence and gives a globally optimal partition of gene expression data with a specified number of clusters when dealing with the data of low dimension. However, the convergence speed of GKA is still not saticfactory when dealing with some higher dimensions. In order to obtain a more comprehensive clustering algorithm, we use an XK-Means algorithm with perturbation for gene clustering. To save the time for selecting rational disturbances, we propose a novel gene expression clustering method named as IXK-Means to avoid empty classes. Furthermore, we follow the hybrid approach of GKA and propose a novel hybrid genetic algorithm named as GXKA which gives a globally optimal partition of a given data with a specified number of clusters.In this paper, we first introduce some basic clustering methods in terms of division and GA for clustering. Then we describe the calculation process of GXKA in details in chapter III. Finally, using finite Markov chain theory, we prove that the GXKA converges to the global optimum. Combining the theoretical results with the numerical experiments, we reach the following conclusions:(1) Under certain conditions, GXKA converges to the global optimum with probability one;(2) In terms of the three evaluation indexes (MSE, Homogeneity, Separation), the clustering results of IXK-Means is superior to that of XK-Means;(3) Compared to that of GKA, the convergence rate of GXKA is significantly improved:It can converge to the stable MSE of GKA in only half of the time, effectively reduce the time complexity of the genetic framework.

Keywords/Search Tags:

Gene Clustering, Genetic Algorithm, IXK-Means, Global Optimization, Bioinformatics

PDF Full Text Request

Related items

1	Clustering Based On Genetic Algorithm For Gene Expression Data
2	Joint Location-inventory Model Based On K-means Clustering Genetic Algorithm
3	Gravitation Field Algorithm And Its Application In Bioinformatics
4	Research On K-means Clustering Algorithm And Its Application In Stock Investment
5	Some Researching On Fuzzy C-Means And Fuzzy C-Lines Clustering Algorithms
6	Complex Network Clustering Based On Multi-objective Optimization Algorithm
7	The Study And Application Of Some Issues For Cluster Analysis
8	The Research And Realizing Of IGA-FCM Clustering Algorithm In Gene Expression Data Analysis
9	Multi-objective Optimization Method For Large-scale Evaluation Problems
10	Study On Model And Algorithm Of Emergency Material Distribution