Bioinformatics is an emerging interdisciplinary, with the completion of various genome projects, the study of bioinformatics has been rapidly developed. It has generated a lot of biological information and data. The explosively rapid increase of the the database information makes people unable to handle, so that collection of these massive new data in the database became information in "data grave". On the other hand, we do know that there are new and essentially important information and knowledges in the massive data newly collected. So how to extract useful information from vast amounts of data has become an urgent problem. With the rapid development of gene chip technology in the past few years, it has become possible for us to manipulate the expression of thousands of genes simultanrously. A large number of microarray experiments sample data (gene expression data) have been produced. Among these sample gene expression data, how to identify genes with similar expression has become a very important task. Currently, the cluster analysis method is one of the tools to process gene expression data People are able now to express mode identical and classify the data. So that the classification would help us in many areas such as gene expression%gene regulationã€cellular processesã€cell subtypes as well as functional annotation of unknown genes in the supplementary biology〠clinical diagnosis and treatment. It also has a wide range of practical applications. Therefore, there has been a large number of scholars have successively designed and applied a variety of clustering algorithms for gene expression data. The current gene expression data clustering analysis of the most commonly used method is the K-means clusteringã€Hierarchical clustering and self-organizing map (self-organzing map, SOM) neural network. K-means clustering is simple and fast. However, the number of cluster centers, the choice of initial cluster centers, the distribution of gene order and gene expression data will affect the correctness and precision of clustering results. Both of the correctness and the precision are not guaranteed. Hierarchical clustering is easy to implement, the results can directly observe the relationship between genes. But the hierarchical clustering need very complex follow-up analysis, the complexity of calculation of the clustering process is very high and less efficient.Self-organizing map can automatically extract the sample data,it is a global decision-making method,.But to use this method, people has to set up the number of clusters and learning parameters first, this would take a longer time. Different clustering algorithms have their own advantages and disadvantages. Facing the exponential growth in gene expression data analysis, people need urgently to explore new cluster analysis methods. At the present, the development of nature-based biometric Computational Intelligence has become a new hot spot of the data analysis techniques. It is expected that it would open a new window in the research for us.Considering the shortcomings of current gene expression data clustering analysis, in the year of2006, Yu Zheng has proposed a new algorithm which is called Particle-Pair Optimization, referred to as PPO. It is originally from the standard PSO algorithm. The main idea of PPO is to concentrated to obtain a better clustering results of the algorithm in a number of gene expression data, but there are some issues to be resolved. This article is to focus on how to further improve the gene clustering algorithm, to carry out research, to obtain better gene effects. Our main research works are presented as follows:(1) Earlier the K-means is applied to gene clustering algorithm, the algorithm is fast, simple principle, the computational efficiency is higher, but the algorithm is more sensitive to the initial cluster centers and it need to pre-set the number of clusters, which for some unknown number of clusters to be clustering analysis is extremely unreasonable. Particle swarm optimization (PSO) is another often used in the analysis of gene clustering algorithm, particle swarm optimization is an intelligent optimization algorithm, the algorithm is to simulate the behavior of bird populations, each individual can be estimated by certain rules the fitness value of their own position, each individual can remember the current to find the best location and find the best location in all groups, which makes the individual in a way close to these directions. In the gene cluster analysis, the particle swarm algorithm to obtain better clustering effect, but the method is also easy to fall into local optimum defect. Ji Zhen propose a new image vector quantization codebook optimization design method-particle algorithm (PPO) base on raditional particle swarm optimization algorithm.the algorithm use particles with smaller groups particle search in the solution space. In each iteration, the particle has order execution speed of the particle swarm algorithm, the location update operation and the standard K-means operation. PPO algorithm for gene clustering experimental results show that, compared with the K-means and FKM clustering effect has been improved. This article by the PPO gene clustering algorithm recently proposed a more in-depth study and analysis of the PPO gene clustering algorithms exist some shortcomings, the PPO algorithm uses a random initialization, initialization of the particles from the optimal particle solution distance, which will affect the accuracy of the clustering results. The K-means and PSO are two commonly used in the clustering of gene clustering algorithm, K-means clustering is fast, but the accuracy is not too high, the clustering accuracy is better achieved by the PSO clustering, but the time overhead. Basis on this, in this paper, we propose a new PPO gene clustering algorithm. Here are the main ideas of two improvements:â‘ At the beginning, we use K-means algorithm to carry out an initial pretreatment on the data, to obtain an pair of initial particles that is closer to the optimal pair. DE algorithm is then introduced in the evolution of PPO algorithm, the improved algorithm named KPPO algorithm;︰sing the PSO algorithm to find the optimal K initial cluster centers, that the use of the PSO clustering initialization particle, DE algorithm is then introduced in the evolution of PPO algorithm, the improved algorithm named SPPO algorithm. In order to verify the effectiveness of improved algorithms KPPO and SPPO, the article uses three genetic data sets in the database cluster experiments. Experimental results show that comparing with the traditional K-means algorithm and the basic PPO algorithm, the newly proposed algorithms in this paper have obtained better quality of initial particle pairs and better clustering results.(2)PPO algorithm still uses the velocity and position update formula of the PSO algorithm, so it is difficult to avoid but also into the defects of the PSO algorithm is easily trapped into local optimal differential evolution algorithm is a good global search capability of intelligent algorithms. We give a thorough discussion and analysis on the the principle of the standard differential evolution algorithm (DE) and its characteristic. We have proposed two new hybrid algorithms:the KPPO-DE algorithm and the SPPO-DE. At the initial stage, we use K-means algorithm or PSO algorithm respectively to do a pretreatment on the given data once, and then carry out the PPO algorithm. In the iteration process of elite particles, DE algorithm is introduced. Here we use the global search ability of the DE algorithm, operating on more than one group of individuals, groups from generation to generation, gradually approach the optimal solution. The advantage here is that by interpret DE in PPO in the process, we can avoid the possibility that when PPO clustering algorithm is in action, it may fall into local optimal solution which are defects solutions. Our hybrid algorithm may exert their advantages of both methods to complete the gene cluster in order to improve accuracy. To evaluate the efficiency of hybrid algorithm, we use the canonical gene expression data sets to carry out the cluster analysis experiments. The experimental results show that the hybrid algorithms KPPP-DE and SPPO-DE obtain better results in three measurement indexes:1. mean square error in the clustering evaluation function,2. The within-class compactness.3. Between-class separation index. |