Font Size: a A A

Research On The Key Technology Of Hot Spot Topic Discovery Based On Microblogging

Posted on:2018-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:H B YangFull Text:PDF
GTID:2348330518967151Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of the economy,the Internet era is also intensified,the Internet thinking to penetrate the major industries,the wave of fastness and freedom of access to more people recognized,more and more people prefer to Express their views,especially microblogging,more and more people's attention,as people a way of entertainment,has also become a platform with independent service platform for information dissemination.However,due to the number of microblogging participation,large amount of data,data format and other complex features,artificial access to microblogging hot information is almost impossible.How to get from the chaos of a large number of microblogging information to get the hot issues of people's livelihood,and microblogging public opinion analysis has become the focus of the current research.This paper mainly studies the related technologies discovered by hotspot topics and designs and implements a system of microblogging hot topic discovery.At present,in the field of microblogging hot spot discovery,the main technologies include microblogging data acquisition,Chinese word segmentation and feature vector extraction,machine learning classification and clustering algorithm and topic discovery algorithm.This article designs and realizes the microblogging hot topic discovery system mainly consists of microblogging acquisition subsystem,microblogging preprocessing subsystem,topic discovery subsystem,application management subsystem and data center five modules.This paper first validates the situation that the common crawler system can not obtain the microblogging data,and then develops the data acquisition system by using the API provided by the microblogging operator.Secondly,this paper mainly uses the word2 vec vector to obtain the semantic information of the segment text,The text feature representation method can not express the semantic information,or can not express the short text of the subject information of the defects;and then through the Single-Pass clustering algorithm and DBSCAN clustering algorithm was combined,the use of sampling strategy to the historical topic and the new And then the support vector machine classification algorithm and K nearest neighbor classification algorithm are combined to improve the accuracy of topic recognition.Finally,on the basis of previous work,the construction and display of the platform is completed and intuitive.The show topic ranking and heat charts.In this paper,the related technology of topic discovery is studied in detail.The microblogging topic discovery system can accurately and quickly find out the hot events of microblogging,and provide a reliable,reliable way for the government and relevant enterprises to discover the hot spots of public opinion.Timely and effective means,so that the relevant units and individuals to take timely and effective measures to deal with,andultimately will be conducive to the harmonious development of society.
Keywords/Search Tags:Word2vec Vector, Text Clustering, Single-Pass Algorithm, DBSCAN algorithm, Topic detection
PDF Full Text Request
Related items