Font Size: a A A

Research On Internet Public Opinion Event Mining Based On Internet Data Clustering Algorithms

Posted on:2013-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:J Y XuFull Text:PDF
GTID:2268330392968010Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays, with the continuous development of global network, the Internet asa new growing media quickly overtakes the traditional media’s role on the numberof customers and socia l influence by its openness and freedom. Especially in recentyears, more significant events have taken place in our country by means of network,as well as the range originating from education, entertainment is gradually extendedto the international and domestic economic, political, civic and other areas, whichhave shown a wave of network public opinion in different levels.On this context, the digging technology of public opinion core word andinformation clustering techno logy became more and more important, which couldbe a valuable direction on further research. This paper mainly completed thefollowing work:First of all, on the basis of analyzing the traditional opinion analysis relatedtechnology, it gives an adaptive data organizationa l structure, which makes theresearch object not only confine to a single network public opinion disseminationcarrier, but lay the foundation of different carrier corpus data analyzed forresearch in the same platform.Secondly, as the core words in current Internet words flood and explosivenetwork group events may not accord with Chinese grammar, we designed a newmethod named network public opinion core word digging algorithm, i.e theCEW(Continuous Effective Words) algorithm, which is designed to improve theICTCLAS segmentation system.In third studies, on the basis of researching the original classical clusteringalgorithm, it gives a rapid and efficient improved clustering algorithm a iming at thecharacteristics of a bulk of network public opinion data and not associated withisolated spots, which could be more accurate and be less recalled.Finally, through a large number of data test results for the manua l inspection,we validate this issue in the design of algorithms in the processing of multiplecarrier corpus data sets has a good result. Besides, through the optimization ofprogram structure, we put the two time comp lexity of the algorithm optimized to thelinear level, which lays a good foundation on its application in large data sets in thefuture.
Keywords/Search Tags:public opinion on the Internet, new internet words found, key words digging, data clustering, multi carrier
PDF Full Text Request
Related items