Font Size: a A A

The Research And Application Of Particle Swarm Optimization Algorithm In Clustering Analysis Of Gene Expression Data

Posted on:2017-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2428330488479869Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of bioinformatics,especially in the post gene era,gene chip technology has been more and more widely used,which directly led to the accumulation of a large number of gene expression data,and how to extract meaning biological information from the massive gene expression data has become one of the hotspots in the field of bioinformatics.Cluster analysis is one of the main tools of bioinformatics analysis,the gene with similar expression patterns can be clustered together,according to the function of known genes can infer the function of unknown genes.This paper based on the clustering of gene expression data analysis,introduced the commonly used clustering algorithm in bioinformatics and analyzed their characteristics,improved algorithm proposed in this paper based on existing algorithms.PSO(Particle Swarm Optimization,PSO)derived from the study of group behavior of birds,is a global optimization algorithm,the algorithm is simple and has fast search speed.On the basis of in-depth study PSO algorithm,which is used in the cluster analysis of gene expression data.This paper,on the base of the hybrid PSO and K-means algorithm,makes a research on the mixed algorithm.Although mixed algorithm PSOK-means enhanced the convergence rate,but there still are two problems:the premature convergence of PSO algorithm and the k value of clustering.In view of these two questions,this paper proposed the following improved algorithms:(1)PSO algorithm prematurity problem.For the premature convergence problem,this paper proposed the a double disturbance particle swarm clustering algorithm DDPSOK-means and the improvement thought is detection algorithm convergence time,if the algorithm converges,executes inertial weight and extreme double disturbance operating,the particles can escape from local optimum state,which once again to perform a global search,the disturbance operating will stop until after several disturbance the algorithm evolution is not continued.The inertia weight is adopted by the nonlinear inertia weight strategy,and the random inertia weight strategy is adopted when disturbing.Using nonlinear inertia weight strategy to search can enhance the algorithm search capability,because nonlinear weighting strategies can balance algorithm search process;disturbance by random inertia weight strategy can enhance the diversity of the particles and the particles can perform global search.At the same time,by using the method of extreme value to disturb the individual extremum of the particle,the velocity and direction of the particle are changed,and the ability of the particle to jump out of the local optimum solution is enhanced.(2)The k value problem of clustering,and the number of clustering algorithms need be given in advance and can not be adjusted in an adaptive clustering process.K value problem in clustering research has been an important research direction,because the k value selected seriously affect the quality of clustering results,different k values will result in a very large difference in the results.This paper discussed the relationship between the fitness function and the k value,found that the fitness function value will be decreased with the k value increasing and the rate in the standard k value decreases suddenly.Using this law,the rate of change formula proposed to capture the inflection point.Finally,posing a particle clustering algorithm with adaptive k value on the base of DDPSOK-means.Particle swarm into local optimal solution and clustering k value problems,we propose two improved methods above,the last four groups using gene expression data for improved algorithms were validated,experimental results show that the effectiveness of the proposed algorithm.
Keywords/Search Tags:Gene expression data, PSO, Clustering, Precocious, The value of k
PDF Full Text Request
Related items