Micro-blog is one of the important social media in China,with a huge number of users.Users can share news and discuss current affairs via short text on micro-blog.At the same time,many companies and governments have opened official micro-blog accounts.They use these micro-blog accounts to share news and interact with users.Therefore,micro-blog,as a place where public opinion converges,extracts events of public concern and analyzes public attitudes has been the focus of research.The extraction of micro-blog events requires research on micro-blog blogs data.The short text and spoken language of micro-blog blogs make them less semantic information and difficult to extract.Therefore,accuracy of traditional text event extraction methods applied to micro-blog blog data is not high.At the same time,with increasing number of Internet users,amount of micro-blog data is also getting larger and larger.Researchers need to consider time efficiency issues faced with processing micro-blog data in big data environment.Aiming at these problems,this thesis proposes a two-stage clustering hot events detection model for micro-blog on Spark.The model processes micro-blog data through two stages: text-cluster and semantic-cluster.Besides,the model designs a calculation framework based on the big data computing engine Spark.In the text-cluster stage,this thesis uses data slicing and the optimized K-Means algorithm to cluster micro-blog blogs.In the semantic-cluster stage,keywords are extracted from the results of text-cluster to provide sufficient semantic information for semantic-cluster,and LDA topic models are used for event detection.Experiments show that the model has improved accuracy and time efficiency.The model has better ability to detect events.At the same time,a visual analysis system of hot events is designed in this thesis.This system uses Web technology and combines user data and blog data,to realize multi-angle analysis of hot events on micro-blog.The system mainly designed with two functional modules,overall analysis and event analysis,to visually analyze hot events on micro-blog.The overall analysis module can analyze the overall situation of hot events on micro-blog through event proportion,gender proportion and event development.The event analysis module supports personalized micro-blog blog retrieval,uses geographic information to perform regional event analysis,and can deeply mine event-related vocabulary by establishing a co-occurrence map of keywords.Practical tests show that the system can visualize micro-blog events,which is conducive to the analysis of hot events. |