With the development of computer technology, Microblog as a new network media has influence all aspects of people’s lives. Every day on the Microblog generates a lot of information, the information or comments on the hot current events, describing the current important events, or for those hot events topics. Microblog its spread fast, easy to use and has widly influence, quickly stand out from the traditional social networks, has become the majority of users to express their views, share information, and an important platform for social interaction. Microblog has a huge amount of information, information is scattered and more features.Current Microblog information analysis technology is a hot topic, and also a wide range of subjects related to this topic. In order to find a hot topic in the Microblog, first need data collection; after the pre-processed to remove noise in the data, then to trigger the identification information to obtain keywords. According to the keywords, we can calculate the correlation between each event. Get event-related proportion of the total amount Microblog; finally, based on events related to the distribution of Microblog and find hot events.In this paper, we first introduce the current status of research at home and abroad. After studying the traditional text clustering methods, combined with the characteristics of Microblog, Microblog proposed suitable hot topic discovery technology. This paper presents Microblog hot topic discovery system, using the Microblog open API and dynamic web analysis technology combination, Microblog data collection; then through noise reduction processing, text, word, and stop word filtering step, the collected data preprocessing. This paper improves the traditional K-Means and BIRCH clustering method, combine the two proposed secondary clustering algorithm K-BIRCH algorithm, which can be pre-processed data after hotspot judgment, extracted a hot topic. Finally, through experiments, analyze and verify the validity and accuracy of the system. |