Font Size: a A A

Design And Implementation Of Microblog Healthy Hot Topic Discovery System

Posted on:2018-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:S ShenFull Text:PDF
GTID:2348330533466288Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of Internet technology,as a new type of open Internet social networking platform in the era of Web 3.0. Because of the advantages of east to use,rapid information dissemination, openness and interactive, micro-blog has been become an important platform for sharing,access and dissemination of information. Every day on the microblog produces a lot of data which contain rich metadata information.At present,the traditional topic discovery model and text clustering technology has been widely used in various fields, and achieved good results. But there are still a lot of limitations when dealing with microblogging short text through traditional topic discovery method, that has brought new challenges for Microblogging hot topic discovery. Therefore, how to quickly and accurately obtain hot topic information from a lot of microblogging data and timely display it to the majority of microblogging users, is an urgent problem to be solved for the micro-blog topic discovery technology.Based on the above research background, in this paper , we combines the characteristics of micro-blog, improves the traditional topic discovery model and text clustering algorithm,and proposes a micro-blog topic discovery algorithm based on VSM model and MLDA model.And designed the microblog-health topic discovery system on the basis. The main work is as follows:Firstly, use open API interface of micro-blog platform and web crawler technology to crawl the micro-blog user information and micro-blog content respectively. Through data down and text word segmentation and other operations to preprocess the micro-blog text data.Second, use VSM model and MLDA model to model the micro-blog text data unitely,construct the eigenvector of micro-blog text, and realize the joint calculation of micro-blog text similarity.Third, secondary clustering of micro-blog data by using the improved Single-passThird, secondary clustering of micro-blog data by using the improved Single-pass algorithm and the condensed hierarchical clustering algorithm to achieve micro-blog hot topic extraction and topic heat calculation and sorting.Finally, According to a variety of test validation and analysis to prove the effectiveness and accuracy of the system.
Keywords/Search Tags:Micro-blog, VSM model, MLDA model, secondary clustering, hot topic discovery, heat calculation
PDF Full Text Request
Related items