Font Size: a A A

Research On Web Crawling Technology In Image Search Engine

Posted on:2007-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:L B ZhouFull Text:PDF
GTID:2178360242961989Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Due to open, distributed and heterogeneous characteristics of Internet, the information on it increases explosively, which makes users more and more difficult to get their required information accurately and timely. Therefore, how to improve traditional information crawling and retrieving method to make Web information retrieval system find the required information rapidly and accurately, and reduce network traffic becomes more and more important.Most web crawler-based web information crawling system is based on the architecture of Client and Server. Recently, the improvements were made on the server side, which adopted cluster architecture. Although the speed of the web information crawling increases, network traffic does not decrease. With the explosive increasing of the web information, information crawling and retrieving spend longer time than before.The dissertation proposes a mobile agent based web information crawling system to improve the existing crawling system, in which the web crawlers run on the mobile agent platform. This system can make full use of the advantages of the mobile agent technologies, which thoroughly change the web information searching mode from"pull"to"push".The crawlers are implemented as mobile agents that can run on the remote web servers, which lead to most computing-intensive tasks, i.e. feature extracting, indexing, can be parallelized and carried out on different remote web servers. Cooperatively executing computing-intensive tasks on different servers by multiple crawlers makes great improvement in processing speed. Moreover, only minimal necessary processing results such as compressed indexing data need to be transferred to the crawler servers, which decrease network traffic obviously and enhance the stability of the system in some extent.To optimize the performance on network traffic and time cost more efficiently when crawling, the dissertation proposes a self-adaptive migration policy which can plan the path of the mobile crawlers and adjust the number of mobile crawler agents. Theorized analyzing and performance testing demonstrate it outperforms traditional systems.
Keywords/Search Tags:Search Engine, Web Crawler, Mobile Agent, Web Crawling
PDF Full Text Request
Related items