Research On Web Crawling Technology In Image Search Engine

Posted on:2007-07-10

Degree:Master

Type:Thesis

Country:China

Candidate:L B Zhou

Full Text:PDF

GTID:2178360242961989

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Due to open, distributed and heterogeneous characteristics of Internet, the information on it increases explosively, which makes users more and more difficult to get their required information accurately and timely. Therefore, how to improve traditional information crawling and retrieving method to make Web information retrieval system find the required information rapidly and accurately, and reduce network traffic becomes more and more important.Most web crawler-based web information crawling system is based on the architecture of Client and Server. Recently, the improvements were made on the server side, which adopted cluster architecture. Although the speed of the web information crawling increases, network traffic does not decrease. With the explosive increasing of the web information, information crawling and retrieving spend longer time than before.The dissertation proposes a mobile agent based web information crawling system to improve the existing crawling system, in which the web crawlers run on the mobile agent platform. This system can make full use of the advantages of the mobile agent technologies, which thoroughly change the web information searching mode from"pull"to"push".The crawlers are implemented as mobile agents that can run on the remote web servers, which lead to most computing-intensive tasks, i.e. feature extracting, indexing, can be parallelized and carried out on different remote web servers. Cooperatively executing computing-intensive tasks on different servers by multiple crawlers makes great improvement in processing speed. Moreover, only minimal necessary processing results such as compressed indexing data need to be transferred to the crawler servers, which decrease network traffic obviously and enhance the stability of the system in some extent.To optimize the performance on network traffic and time cost more efficiently when crawling, the dissertation proposes a self-adaptive migration policy which can plan the path of the mobile crawlers and adjust the number of mobile crawler agents. Theorized analyzing and performance testing demonstrate it outperforms traditional systems.

Keywords/Search Tags:

Search Engine, Web Crawler, Mobile Agent, Web Crawling

PDF Full Text Request

Related items

1	Based On The Theme Of The Search Engine Of The Mobile Agent
2	Spider Crawling On Mobile Search Research And Implementation Strategy
3	Research On Focused Search Engine Based On Mobile Agent
4	The Research On Focused Crawling Algorithm In Vertical Search Engine
5	Vertical Search Engine For Crawling The Web Page Design And Implementation
6	Design Of A Parallel Web Crawling System
7	Research On Crawling Strategy Of Multi-Agent For Focused Search Engine Technology
8	Research And Application Of Vertical Search Engine Key Technologies Based On The Lucene
9	The Design And Implementation Of Topical Search Engine
10	Research On Topical Crawler Combining Web Page Content And Hyperlink