Along with the rapid development of Web technology,micro-blog has become a new social networking platform for people to communicate and build relationships with each other.On this platform,people can freely express their opinions and views on some topics.The characteristics of simple content and release easily lead to produce a huge amount of information every day.Extracting hot topics that people interested in from the huge amount of micro-blog information is becoming a hot research.The main research contents in this paper are as follows:(1)Firstly,on the basis of home and abroad study of clustering、PSO and micro-blog topics,their basic theories and improved ideas are also researched and analyzed deeply.Spark cluster is successfully built which is used to deal with big data according to the project and data analysis requirement.(2)Secondly,a KCPTF algorithm(a K-means algorithm based on Chaotic Particle swarm optimization with Time Factor)is proposed.To take full advantage of the PSO global search ability,a nonlinear decreasing time factor which can make particles fast locate to the optimal solution is introduced.Chaotic search technology is introduced in to prevent particles falling into local optimum,which can improve the particles global search ability and ensure the swarm’s diversity.A boundary buffer wall technology,which can adjust the particles speed and position dynamically,is also introduced in to avoid particles fly away from the efficient flying space,then the improved PSO is combined with K-means algorithm.UCI datasets are simulated in Matlab,the comparison results show that the KCPTF algorithm has a higher cluster accuracy.(3)Finally,the KCPTF algorithm is applied to cluster on sina micro-blog topics,and a micro-blog topic prototype system which can get the hot topics in a period of time is developed on Spark cloud computing platform,and it achieves the project expected clustering requirements. |