| With the web2.0 network transformation, the social media is able to flourish. Weibo as the public social media, has increasingly become the network user publishing personal view, accessing and sharing all kinds of information network platform. People can not be restricted by time and place, anywhere at any time to edit and share information in real time. Sina weibo, for example, users upon millions of daily information release to the microblogging platform, the large amount of information is closely related to people’s social life. By analyzing weibo data, it can be from the view of academic, market and policy to constitute valuable information and understand the demand of the public and attention. Through the overall understanding of the event, the relevant departments can control the developments, and also can properly guide public microblog opinion.Therefore,the investigation information processing technology for weibo has very important theoretical and application value, weibo topic discovery and detection technology also gradually become a lot of researchers a research hot spot.Through the relevant topic detection technology, weibo topic found is to organize methodically dispersed weibo information for presenting. Within 140 words of text information, however, it is corresponding to the characteristics of weibo information:short text, a little words; And based on oriented popular microblogging information, its openness and interactivity make text information more popular, and words less normative, less rigorous. These will undoubtedly increase the difficulty of microblogging topic research. If you can timely understand and obtain the hot topic of people’s attention, and the event’s point of view from massive weibo data, that will be conducive to national controling public opinion events, and enterprises knowing about the information market so as to make the right decisions.So, finding and detecting the microblogging topic, from the accumulation of large number of weibo to obtain user potential value information, is a challenging task.This thesis presents a model based on topic modeling method of weibo topics found. Combined with the characteristics of weibo data in the process of data preprocessing, this thesis joins in the process of extracting the candidate word improvement points word accuracy, and weibo data for feature selection, and then adopts the Topic Model - BTM (Biterm Topic Model) modeling, k-means clustering algorithm are used for data clustering analysis of spatial Model, finally obtained by the multiple Topic cluster analysis of the distribution of the keyword description of collection of key topics and themes, realize the Topic of mining. This method solves the microblogging word count limit leads to the semantics of the sparse problem, using the rich corpus information for modeling and inference, which makes microblog this text semantic expression is not clear to solve problem, and less indirect use of language is expected to expand the process of text information, reduces the text to rely too much on external expected problem.In this thesis, through a certain experiment, it shows that model based on topic modeling microblogging topic discovery algorithm is effective. |