Font Size: a A A

Research On An Algorithm Of CUDA-based Cellular Particle-Pair Optimization And Firefly Algorithm In Gene Clustering

Posted on:2016-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2308330464454733Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of life science, the research category of bio informatics is expanding constantly and widely. Microarray technology, as one of the most promising technology in bioinformatics, has been raised attention by scholars. This technology can detect thousands of genes at the same time, and produces a large number of gene expression data that contain the genetic information. One of the problems to be solved in Bio informatics is how to analyze vast amounts of gene expression data efficiently, and how to obtain meaningful information to Mankind. Among them, the clustering analysis is widely used in microarray data analysis and has high merit of scientific research.Nowadays, the modern intelligent bionic algorithm attracts more attention in the analysis of microarray data, especially for gene clustering. The PSO, GA and other algorithms are successfully applied in gene clustering, and obtained good clustering results. However, for a single algorithm is hard to get better clustering results, many scholars have proposed a variety of improved algorithms with high clustering performance and high calculation efficiency. Hybrid algorithm which combines the advantages of a variety of algorithms for each other may overcome the limitations of a single algorithm and their shortcomings, is an important development direction of current gene clustering algorithm. And recently it has achieved good results in gene expression data clustering.With developing in the microarray technology is rapidly, enlarging on the scale of data greatly. Moreover, strength and complexity of the calculation exceed the scope of PC in processing. A new technology named Compute Unified Device Architecture (CUDA) in HPC has appeared and raised up rapidly, it allows the GPU to break the limitations of graphic language, making the high performance computing on PC came true, and it also opened up a new way for scientific research and application in promoting and popularizing including larger-scale data processing and intensive computing.Researching on gene expression data clustering algorithm is continuing studying and exploring. Particle-pair optimizer algorithm (PPO), as a novel gene clustering algorithm with small group size, is easy to coordinate the position between particles and better clustering results. Also, it is one of the gene clustering algorithms widely used currently. But its disadvantages of trapping in a local minimum early, powerless ability in global searching, low accuracy restricts its application and development. Cellular automata (CA) as a discrete model of a complex system, is one of the important methods to study space-time evolution process. Using the cell rules reasonable can increase the ability of communication between cell and its neighborhood. Through the scientific and reasonable design, we integrated the PPO and CA together to complete the process of evolution, and we can utilize advantages of C A to improve the PPO’s trapping in local minimum early.In addition, the Firefly algorithm (FA) is one of the efficient algorithms in solving complex optimization problems. It uses disturbing factor in process of location update and has its own advantages to avoid trapping in a local optimum. This paper proposes a hybrid algorithm mixed cellular pair-particle optimization (PPO (CA)) and Firefly algorithm (FA) named PPO (CA)-FA. To overcome its disadvantages of trapping in a local minimum early, powerless ability in global searching, low accuracy and so on, The hybrid algorithm introduces the CA into the first iteration of PPO, using the corresponding cellular rules to update particle’s individual optimal fitness value, historical best position and apply its powerful neighborhood communication ability, to avoid falling into local optimum prematurely. We need to expand the number of particles in the first part of PPO because of the addition of CA. It is not only improves the global optimum particle velocity spread in the population, but also guarantees the full search in neighborhood and gets better accuracy. On the basis of PPO (CA), if the minimum error of result within a specified range in the second iterative process, we introduce the FA. So that we can make full use of the FA’s advantage for searching the solution space efficiently to obtain more effective clustering results.In order to verify the effectiveness of the hybrid algorithm, we compared PPO (CA), PPO (CA)-FA with K-Means and PPO. Experiments in the 4 group of standard data such as CellCycle384, histone.pcl,6400 and i2282.pcl, shows that the proposed algorithm PPO (CA)-FA obtains better clustering precision than other algorithms mentioned in this paper and also has improved in square error (MSE), homogeneity and separation. To verify the effect of CA’s blend, we compared PPO (CA) with only expand the population of particle without introducing CA (PPO(noCA)),results show that the addition of CA improved the stability and accuracy of clustering and the disadvantage of PPO’s trapping in local optimum easily. The integration of the FA enhanced the global searching ability of PPO (CA), making clustering results better.For processing big data and larger-scale problems, the hybrid algorithm spent too much time and lower efficiency. For a better practical applying, we design and realize a kind of parallel PPO (CA)-FA algorithm with CUDA, and use different kinds of optimization techniques on the algorithm which improving the efficiency of hybrid algorithm and guaranteeing the accuracy. According to the different levels of parallelism, this paper proposed two optimization methods. The experiments results manifests that the optimized parallel algorithm can obtain a better acceleration. The experiments on data set of histone.pcl get the best acceleration ratio up to 16.8. Compared with the parallel algorithm appeared in recent years, taking the different hardware environment into account, the parallel optimized PPO(CA)-FA get best acceleration effect in similar data size.
Keywords/Search Tags:gene clustering, particle pair optimization, CA, FA, CUDA
PDF Full Text Request
Related items