With the rapid development of high throughput technologies, large amounts of geneexpression data have being produced. To process these gene express data and extractbiological or medical knowledge from them, has become a hotspot Post-genomic Era.Clustering as one of the main techniques of data mining, has been widely applied in theprocessing gene expression data.Although the clustering analysis could give some evaluation criteria for clustering results,the optimization of these rules is a combinational optimization problem. Currently, onlysuboptimal iterative algorithms can be adopted which could not converge to best optimalsolution. At the same time, the initial value and the choice of similarity function have aneffect on the clustering result.In this paper, the input of the clustering algorithm, Affinity Propagation Clustering andParticle Swarm Clustering are studied using gene expression data. The main work is asfollows:1. With standard data sets of gene expression, how to select similarity metric and datatransformation, the performances of K-means clustering and Affinity Propagation Clusteringwere studied. The K-means algorithm and the Affinity Propagation algorithm are suitable forEuclidean distance and the log transformed data. In terms of the leukemia data, using AffinityPropagation clustering algorithm, selecting a different number of genetic information, the bestnumber of information gene is1000.2. The validation ability of information entropy or gene clustering results wasinvestigated. The result shows that the information entropy can estimate the number ofclusters and reflect performance of clustering results. It was made clear that informationentropy is useful.3. In this paper two improved PSO clustering algorithm based on K-means and AffinityPropagation (APPSO) is proposed which provides new ideas and methods for clusteranalysis. Firstly the proposed algorithm get initial cluster centers by K-means and AffinityPropagation. Secondly obtained initial cluster centers are regarded as inputs of one of allparticles instead of being assigned randomly. Finally, compare the two improved PSO clustering algorithm with PSO by experiment, the improved PSO clustering algorithms areproved not only high accuracy but also certain stability. |