Font Size: a A A

Hot Topic Detection Based On Micro-blog

Posted on:2018-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z YuFull Text:PDF
GTID:2428330542976892Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,micro-blog has linked up each other,every day in micro-blog information has become an urgent need to develop too many to count,data resources,to help people to better study the micro-blog data stream,find out what people said in platform.In this paper,there are four aspects of hot topic discovery had been done as follows:(1)Micro-blog data acquisition.Firstly,to solve OAUTH2.0 in the process of micro-blog API data acquisition,using a thread of control transfer of the return value of API size and micro-blog API access frequency,prevent the equipment downtime;secondly,to solve the crawler simulated login in crawling process,using the regular expression take the user relationship or specific content of micro-blog,micro-blog data access through page analysis;finally,by using the data acquisition technology based on micro-blog API and micro-blog crawler.(2)Micro-blog information preprocessing.Firstly,the statistical characteristics of the irrelevant micro-blog,creating the removal of the interference data rules and filtering of micro-blog data in accordance with these rules.Secondly,according to the order of micro-blog,micro-blog in accordance with the division of the window all the window of a micro-blog N rules;finally,through ICTCLAS Chinese word partition system of micro-blog information remove the word,words and POS tagging disabled,meaning micro-blog in the selected table of nouns and verbs as candidate words.(3)Hot topic discovery algorithm.Firstly,will describe the activity of the number words words weighted acceleration and the weight of words,the relative frequency calculation of adjacent words speed change and the current window window to calculate the weight of words,words and words of the second acceleration;the candidate words according to the activity of small words reverse order,selected from the high frequency words as keywords topic detection,similarity by double conditional probability calculation keywords words;finally,the distance between words based on word similarity,and then through the single pass clustering algorithm to calculate the similarity of keywords and topic detection has been completed,the hot topic.(4)System development.The use of Python language micro-blog hot topic detection system management,micro-blog acquisition and hot topic detection module development,the use of Mysql database for micro-blog database development.System test expatiate that the micro-blog hot topic detection system designed by this paper can effectively detect the key words in micro-blog contains hot topic,through the analysis of these key words users can get hot topics in the period of micro-blog.
Keywords/Search Tags:weibo, hot topic detection, word activity, Python
PDF Full Text Request
Related items