| With the rapid development of information technology,modern human beings can not live without the internet.The development of network media like Sina Weibo,Wechat,Blog make humans making all kinds of network information all along.At the same time,a variety of data such as literature and statistical data varied from paper to electronic information,leads to exponentially explosive growth.Information lack era has become increasingly distant from us,a problem to us,how to find the target information efficiently and accurately in this increasingly disorderly accumulation of massive text information.This topic has been a hot research point.Text mining is an important branch of data mining,including text classification,clustering analysis,prediction trend and association rules.Text clustering is a text set automatic grouping process via unsupervised learning the text similarity to make that little similarity among different clusters and more similarity among the same cluster to discover knowledge and rules from the massive text dataset.At present,text clustering has been applied to many fields,such as text mining,information retrieval,personalized recommendation and so on.In the current study of text clustering,most research put their point on the text similarity or how to apply the traditional clustering algorithm to text clustering problem,but traditional clustering algorithms such as K-medoids,can only produce a local optimal solution,it lacks the stability and accuracy.At present,the application of swarm intelligence optimization algorithm to improve such problems,such as PSO,ACO.The firefly Algorithm(Firefly)is a simulation of the development of the biological characteristics of fireflies in nature.The algorithm is simple,robust and easy to be achieved.Compared with the genetic algorithm and particle swarm algorithm,the search for global optimal solution ability is stronger,faster convergence speed,and has been applied to many fields of optimization,such as clustering analysis and image processing.But the application of the firefly algorithm on text clustering is still in the initial stage,so how to improve firefly algorithm to obtain better performance according to the characteristics of text clustering,combined with the advantages of firefly algorithm has important significance.The main work of this paper includes the following aspects:(1)Improvement of the traditional firefly algorithm.Although the traditional firefly algorithm has the advantages of simple operation,strong robustness and easy realization,it still has some disadvantages,such as low convergence speed and premature convergence.The convergence speed is not fast enough,this paper designed an adaptive step size rule for FA to adjust of the speed and direction of each firefly while its flight distance is too far away from the best result.At the same time,in order to speed up the search,this paper proposes a strategy to reduce the running time of firefly.(2)Combination of the improved firefly algorithm and K-medoids algorithm.Based on the analysis of the characteristics of the firefly algorithm,this paper applies the firefly algorithm to text clustering,and proposes a hybrid algorithm which combines the firefly algorithm and the K-medoids algorithm.(3)Experimental analysis.The K-means algorithm,the K-medoids algorithm and the advanced algorithm proposed in this paper are used to analyze the clustering results.The experimental results show that the hybrid algorithm based on the K algorithm and the K algorithm has better clustering quality than the traditional K algorithm.This will be the first firefly algorithm used in text clustering,imitate the fireflies which based on flight behavior sensitive coefficient,width,distance and other factors to constructing firefly swarm,through the firefly swarm to find the optimal cluster centers in each cluster.This paper provides a new method for the research of text clustering,and improvement of the firefly algorithm development. |