Font Size: a A A

The Research And Implementation Of Listed Company Public Opinion Mining System Based On Hadoop

Posted on:2014-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:J WuFull Text:PDF
GTID:2268330401967125Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, with the rapid development and popularization of Internet, more andmore Internet users can post their views and opinions. The Internet information carriersare also diversity, such as E-mail, portal news, BBS, blog, micro blog, community, IM,etc. Internet users can use these network resources to browse news, comment andforward others’ views. However, these lead to the development and spread of Internetpublic opinion (IPO) that is hard to control. For listed company, crisis IPO influences itmore and more seriously. Once any emergency of company occurs, it will spread ontraditional media and Internet with geometrical diploid. So companies need to monitorIPO, know crisis IPO’s trend and master the image and reputation of company on theInternet. They are directly related with the company’s economic benefit and socialpositions.The traditional public opinion monitoring system can achieve the purpose. Butnow the big data makes traditional ways becoming powerless and low efficiency. It’sdifficult to attain real-time monitoring. So this thesis researches and develops a listedcompany public opinion mining system based on the Hadoop, using Hadoop distributedmass data processing. We can solve the problem and realize high performance of massdata mining. Then help listed company monitor Internet crisis public opinion.The system provides integration process including the data collecting, data mining,data stored and data display. The main work is as follows:(1) Use eTools Metasearch engine and Nutch to grab Internet resources;(2) Use jsoup, Tika etc. tools to extract web content;(3) Integrate Lucene and Solr to index the web page and provide search function;(4) Combine with Hadoop and Mahout to mine mass data. Find IPO hot spots andshow IPO situation based on clustering and classification algorithms.(5) Store data using XML format, relational database MySQL and HDFS.
Keywords/Search Tags:Internet public opinion, Web crawler, Hadoop, Data mining
PDF Full Text Request
Related items