Font Size: a A A

Research On Hotspot Detection Technology Of Microblogging Public Opinion Based On Text Clustering

Posted on:2016-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:S L LiuFull Text:PDF
GTID:2348330542975887Subject:Engineering
Abstract/Summary:PDF Full Text Request
Microblog,a new rising online communication media,has been very popular among the majority of netizens since its birth.It has become an important platform for its users to express their wishes.Microblog platform is flexible and convenient for the users to be involved in communications,but brings a great challenge to the Internet public opinion monitoring,meanwhile.The users post large amounts of information every day,which consists of the users' emotions and their views on various social problems.Hence,in china,microblog has become a carrier of expression consensus of the public.At present,domestic and international public opinion monitoring system is mainly aimed at the BBS forums,news sites and other online media.The detection of public consensus in the microblog still needs further improvement.Under this condition,this paper proposed a method for detecting hot topics of microblogs.The method of the detection of hot topics of microblog proposed in this paper is mainly improved from three aspects,that is the extraction of feature vocabularies from the microblog data,weights calculation of feature vocabularies and the text clustering methods.To be first,based on the unique expression format of microblogs,we prefer to choose the vocabularies carrying more information,or appeared in a more important position,when extract the feature words from the microblog data.In this way,the dimension of the feature space is reduced to enhance the computational efficiency of the system.Secondly,according to the information scale carried by the characteristic vocabularies,we weighted the vocabularies appeared in an important position,when calculating the weight values of characteristic vocabularies.What's more,we also weighted the vocabularies contained in the microblogs highly forwarded or commented.Lastly,for the microblog information clustering analysis,we proposed a density–based K-means clustering algorithm.The algorithm utilizes the distribution of the data objects in the vector space to calculate the clustering center,avoiding the effects brought by the noisy and outlier data in the clustering data.It improved the accuracy and stability of the clustering results.Finally,to prove the effectiveness of the method proposed in the paper for the detection of the hot topics of microblogs,we verified the validity of the density–based K-meansclustering algorithm by experiments,and compared its performance with K-means clustering algorithms,which indicates that each of all the performance metrics is considerably improved by the density–based K-means clustering algorithm proposed in the paper.
Keywords/Search Tags:microblog, topic detection, text clustering, Density, K-means clustering algorithm
PDF Full Text Request
Related items