Website Information Acquisition System Reptiles Subsystem Design And Implementation Of Demand

Posted on:2012-01-17

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhao

Full Text:PDF

GTID:2218330335998190

Subject:Software engineering

Abstract/Summary:

As we know, since the mid-90s, the internet has become a platform of some important social activities such as government, business, education, entertainment, because it has advantages of independent information, convenient obtaining, wide geographical area and low-cost maintenance. Therefore, the people pay more and more intention to the security of internet, which is different from the traditional security. However, the traditional search engines cannot provide customized service and the result is not real-time enough to some specific requirement. We designed a simple spider system to execute the customized tasks in time.Comparing with the traditional search engine with a global huge task, our system is targeted on a limited number of web-sites, and reduce the scope of searching as much as possible by adding the restrictions of searching width (limited number of sites)and depth (max-depth of URL) to meet the critical real-time requirement from user.Furthermore, for high parallelism, we split a task into many sub-tasks, and depend on the consistent hash algorithm to do scheduling of sub-tasks. The algorithm makes sure the workload of crawlers are balanced, and reduce the reassignment of sub-tasks as much as possible when the number of crawlers increases or decreases.For some specified web-sites, we have tested and proved that this crawler system is efficient, scalable, and robust.

Keywords/Search Tags:

Parallel crawlers, Task allocation, reptiles management, Consistent Hash algorithm

Related items

1	Multi-robot Task Scheduling Method Based On Parallel Computing
2	Research On Performance And Power Consumption Optimization Of Distributed Cache System Based On Consistent Hash
3	Research On Adaptive Task Allocation Algorithm In Wireless Sensor Network
4	Research And Design Of Auction-Based Multi-Agent Task Allocation In Storage
5	Auction-based Multi-agent Task Allocation In Smart Logistic Center
6	Auction-Based Multi-Agent Task Allocation In Smart Logistic Center
7	Research On The Key Techniques For Parallel File Storage System
8	Service Data Management In Intergrated Sensing Network
9	The Research Of Task Allocation And Scheduling Strategy On PVM System
10	Efficient task scheduling and allocation for two-dimensional mesh-connected parallel systems