Font Size: a A A

Research Of Hot Topics Detection Based On The Miro Blog

Posted on:2017-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y HeFull Text:PDF
GTID:2348330503484922Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and the rapid popularity of smart phones, Micro Blog users has increased every year, Micro Blog content short,convenient and quick release. People are interested in their own Micro Blog to express their views and make comments, Micro Blog showing growth spurt, some of the hot social issues will soon spread through the Micro Blog platform. Micro Blog has gradually become one of the main route of transmission network public opinion. With respect to news, web pages and other traditional texts, Micro Blog more colloquial,fragmented. In this thesis, based on data from Micro Blog, Research on Micro Blog hot topic discovery through data preprocessing, text modeling, topic clustering and other processes. Finally, Micro Blog hot topic will be sorted in accordance with the heat.The main contents include:(1)Using API interface in Sina Micro Blog platform to obtain the initial data.Afterwards, The acquisition of JSON format Microblog data will be parsed,Subsequently conduct data filtering, segmentation, stop words and other pre-processing operation, obtain initial Micro Blog text corpus.(2)In connection with traditional high dimensionality vector space model,without considering the problem of semantic relationships between words, Proposed a new Micro Blog text LSA modeling method, through by the word- document vector space matrix conduct singular value decomposition to obtain approximate matrix, On the one hand retains the semantic relationships between words, on the other hand reduces the dimension of the data; a high-dimensional vector space is mapped into low-dimensional semantic space, and using the collected Micro Blog data conduct instance verification.(3)Based on the in-depth study of classical clustering algorithms, According to Micro Blog data, Proposed a two-stage clustering algorithm combined with divides clustering and incremental clustering combines, in the first stage, the traditional K-means clustering algorithm initial centroid randomly selected problem, it has beenimproved, Using the improved K-means clustering algorithm achieve first clustering;in the second stage, for the new data using incremental clustering method do secondary clustering, and make the experimental comparative analysis.(4)The topic heat has been defined by the number of comments and forwards,Afterwards topics were sorted, compared to the official hot topic verify the effectiveness of the method in this thesis.
Keywords/Search Tags:MicroBlog API, data parse, LSA, K-means clustering algorithm
PDF Full Text Request
Related items