Font Size: a A A

Design And Implementation Of Crawler System For Public Feelings On Internet

Posted on:2015-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiFull Text:PDF
GTID:2268330425995302Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Every minute, internet generates huge amount of information, which covers different domain, such as life, technology, and military and so on. Some of information is negative, for a large company or organization; the spreading of a negative message may cause serious consequences. More and more company needs a specially customized internet information gathering and monitoring system. The integrity and timeliness of information gathering is the most important thing here.This thesis mainly describes the design and implementation of crawler system for monitoring public opinion on the Internet. The main works are as following:1. Web page downloading and information is filtering:The system fetches a large number of HTML pages from specified data sources based on keywords directed crawling. The already crawled page will be filtered out.2. Extract key information:extract key information from downloaded html files based on both ontology extraction method and custom extraction method.3. Data updating and storage:Update web page using improved process prediction algorithms and fixed-time crawling. Use shared MongoDB cluster as persistent data storage system.4. Job queue and crawler status monitoring:Use a task queue system to control and manage crawling task status, and use Graphite as real-time monitoring tool.The research and implement of this project meets the needs of companies and organizations that are eager to detect of negative information.
Keywords/Search Tags:Public Feelings on Internet, Crawler System, Focused Crawler
PDF Full Text Request
Related items