Font Size: a A A

The Design And Implementation Of Internet Public Opinion Analyzing System Based On J2EE

Posted on:2013-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2248330371987914Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the expansion of the network information, all kinds of information on news sites and forum are getting huge. In such a large data, quickly and accurately finding news items or forum topics related to the topic and of interest to users become increasingly difficult. Crawling results provided by search engine such as Google, Baidu topic is too broad, and timeliness is also difficult to guarantee. In addition to some of the results of location and keywords have been purchased leading to retrieve the timeliness, relevance, manual retrieval efficiency is neither satisfactory to meet the need of specific areas of the user’s precision search. To this end, the purpose of this paper is to achieve a precise highly match the timeliness of the page, web content relevant and search keywords web crawler system.With the Internet security increasingly become an important part of building a harmonious construction of spiritual civilization, the prevention of hostile forces of the network navy network and listening to the voice from the people become increasingly important. This also led to an endless stream of domestic public opinion monitoring network crawler companies, and I have my internship in such an internet company focus on specific user groups. The public opinion analysis system data is currently from the main forum (such as the West Temple, Tianya, Sina, and Post Bar)and news sites (such as Sina and other portals), and Baidu, Qihoo news search engine results page. After web recursively crawl completed, HtmlParser, which is an open source web analytic tool, parses out the main structure, and filter basis on time, theme, and content, then update to the database for the front Jsp page views. Crawler system supports the timer tasks and ad-hoc instant trigger. Logical level using a number of open source technologies such as Spring, Hibernate, Struts, to build MVC-based business processing systems. Background crawling, parsing, filtering subsystem also uses open source technologies such as the Berkeley DB Apache Lucene and HtmlParser to improve the system.
Keywords/Search Tags:Web Crawler, J2EE, Internet Opinion
PDF Full Text Request
Related items