Font Size: a A A

Research On Web Data Acquisition And Managenment For Online Public Opinion Analysis

Posted on:2018-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:L Y TangFull Text:PDF
GTID:2348330512483030Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In this age of new network media,the Internet is changing the current public opinion environment.People will express their views and comments on social focus and hot topics through the network,this information reflects the public will and emotion,which constitutes an important part of the online public opinion.However,most of the online public opinion information data are distributed in the form of isolated data sources,such as micro-blog,BBS forums and news websites,working as information isolated islands.This thesis studies how to collect the isolated information data and make unified management,then do online public opinion analysis to make timely public opinion warning,helping the government and relevant departments take relevant measures to control the development of public opinion.It is really a practical and meaning work.In this thesis,after studying a large number of study papers and relevant technologies about online public opinion analyses at home and abroad,especially on network public opinion information data collection and management.In the end,design and implement a complete online public opinion analyses system.Here,my main works as follows:1.Research on online public opinion data collection method of multi data sources.According to the theme content of public opinion information,we select the appropriate information sources and design the corresponding data collection method.2.Research on web news text extraction algorithm.Web news data is main part of our public opinion data acquisition and text information extraction in the process of webpage preprocessing is an essential work.In this part,using the extraction thoughts based on statistical methods and webpage structure,firstly improves a general web news text extraction algorithm,and then designs a web news text extraction algorithm based on statistics and web page structure.Finally,makes a comparison on the precision and speed of the two web news text extraction algorithms.3.Research on massive public opinion data management method.In this thesis,using Hadoop+HBase distributed system to achieve the storage of massive public opinion data.In the view of the limitation of HBase for data retrieval,using distributed fulltext indexing tool Solr to implement two level index of HBase to solve the problem of massive Chinese public opinion data retrieval.4.The design and implementation of a complete online public opinion analysis system.Based on the previous research and design,this paper constructs and implements a complete public opinion analysis system which is based on the collection and management of public opinion information data.For the acquisition of micro-blog data,implements a simple sentiment analysis application.
Keywords/Search Tags:Online public opinion, web crawler, information extraction, HBase, sentiment analysis
PDF Full Text Request
Related items