Font Size: a A A

Research Of Hot Topics Detection In Internet Public Opinion Based On Cloud Computing

Posted on:2017-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2308330503969142Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, Internet communication has become a major way for spreading social information. Whenever there is any sudden sensitive issue, public opinions would be soon gathered on Internet, which therefore forming the public opinion event. However, Internet public opinions involve rich topics, great amount of information and complex subjects, which making it easier to create information and comments which are, in a traditional meaning, vulgar, barefaced, pornographic or violent, and would even threaten national harmony, stability as well as safety. Detecting hot topics of Internet public opinions enables decision makers to promptly and precisely aware topics that netizen concern. The process of detecting hot topics of Internet public opinions is a process of mining text, however, traditional process of which is hard to comply with those characters of Internet public opinions. As such, it is more and more important to solve problems of data mining algorithm in respect of efficiency, availability and usability.Above is the background of the topic selection and research of this paper. Referencing latest study results of relevant theory and technology of text mining during recent years, this paper conducts analysis on traditional text mining model, together with establishment of models including data collection, Chinese words segmentation, feature extraction, feature weighting computation and text feature vector space, as well as similarity analysis and realization of clustering algorithm and heat analysis. Aiming at the issue of too many feature items during traditional process of establishing text presentation model, with the combination of short text features of Internet public opinion data, this paper proposes method for model dimension reduction of frequent public short text presentation basing on semantic feature item, by which to reduce dimension of text model. In terms of clustering algorithm, the classic incremental clustering algorithm, Single-Pass is chosen. Targeting at deficiency on such algorithm, this paper put forward improved Single-Pass clustering algorithm, which is able to problems such as sensitivity of algorithm in respect of input data order and computational efficiency, also it proposes analysis model for Internet public opinion heat. Basing on above procedures, this paper realizes parallel algorithm with the basis of MapReduce in the process of data pre-processing and clustering analysis, and it analyzes solving efficiency and quality of using chart.The method of detecting hot topics of Internet public opinion basing on cloud computing discussed in this project, in a way solves the deficiency of traditional text mining model in terms of ability shortage in dealing massive Chinese text data. What’s more, this project is of low project cost, easy to explore, and it could be an effective way to control Internet public opinion, being applied in practical life.
Keywords/Search Tags:Internet public opinion, Cloud computing, Hot topic detection, Incremental Clustering, Dimension reduction
PDF Full Text Request
Related items