Font Size: a A A

Research And Implementation Of Internet Public Opinion Analysis System Based On Hadoop

Posted on:2016-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:L TongFull Text:PDF
GTID:2298330467497437Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Public opinion is a collection of the social and political attitudes, beliefs, values andideas which are exhibited by the social people base on the occurrence, development andprocesses of a social particular event or phenomenon in a certain time and scope.With the rapid and large-scale development of the Internet, the WEB information isalso growing at an astonishing speed, which has become a comprehensive informationdatabase which has the most types, numbers and scales of resource in the humandevelopment history. The Internet has a powerful influence and the most promisingdevelopment which has made the Internet one of the mainstream media in the area of newsdissemination and the main carriers reflecting the social public opinion.In this case, however, the negative network public opinion can spread easily in a largenumber of Internet users, which have a great impact on the harmonious development ofthe society. So it is necessary to use the modern natural language processing and datamining technology to analysis and process network data, and it is also significance for therelevant government departments to acknowledge public opinion information timely andaccurately.This article examines the implementation of the network public opinion analyticsbased on the characteristics of network public opinion. This paper expounds thebackground, significance, current status at home and abroad, goal and structure of theresearch of this topic.And the paper also introduces the advantages of Hadoop platform indata processing, data collection technology, text vector space model and clusteringalgorithm.The system is developed depend on Hadoop, which is divided into five modules,namely the data collection module, data preprocessing module, data clustering module,public opinion analysis module and application module, which has completed the functionof the requirements of network public opinion analysis. Among them, the data collectionmodule adopts different data acquisition techniques according to the characteristics of thedata source. It collects data from news website by Nutch, and for the data from microblog,it is processed by its own API interface. The data preprocessing module uses FudanNLP tocarry on the Chinese word processing, and establishes a stoplist, filters of the nonsensewords like auxiliaries, adverbs and prepositions. It also establishdes the text vector spaceby using the TF-IDF algorithm based on the data preprocessing module.For the data clustering module, the paper proposes a clustering algorithm, which combines Kmeansand Canopy with Semantic similarity according to the characteristics of Chinese languageitself such as synonyms, polysemy and etc, so as to improve the ability to find the networkpublic opinion; the public opinion analysis processing module achieve the function of thesensitive detection, hot spot detection and content analysis. The application modulepresents the network public opinion by Webpage.This paper tests the function of the Internet public opinion monitoring analysissystem, verifies the Internet public opinion monitoring analysis system which hasachieved the target by analyzing the simulation results. Finally, for the defects of thesystem, future work is described.
Keywords/Search Tags:Public Opinion, Hadoop, Kmeans, Canopy, Semantic similarity
PDF Full Text Request
Related items