Font Size: a A A

The Research, Implement On Technology Of Distributed Web Crawler

Posted on:2007-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:X SuFull Text:PDF
GTID:2178360185986122Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With information rapidly expanding in the Web, many Web-related services came in increasing numbers accordingly. Web information is applied in many fields, and the request of people becomes more and more rigorous, so the Web crawler who has charge of gathering of Web information is facing a real challenge. A number of big companies overseas or domestic have solved it in a perfect way, and their products have been in practical use, whereas, the search engine at large-scale can only supply a common service which cannot be customized. They cannot consider the varied demands of people. Web crawler of stand-alone cannot meet the demands in most situations. The Web crawler of medium-size can be customized conveniently and has a good performance to solve the problem. Therefore, we get into the research on the technology of the distributed Web crawler.In the research of Web crawler, the most important things are structure design and solution of the key technologies. Based on the work of other people, we described the structure design of a distribute Web crawler, which including the organization of hardware and module partition of software. In this paper, one PC is utilized as the main node, and other PCs as the common nodes which are connected in LAN. The software architecture included main node design and common node design. Then, we analyzed solutions of the major techniques of the distributed Web crawler, such as how the nodes of the crawler cooperate with each other, how the task is distributed, how to keep the important Web fresh. We have proposed some practicable arithmetic to solve the problems mentioned above. Besides, we implemented a robust, distensible, customized, distributed Web crawler, and anatomized it. At last, we gave the results of two experiments, including common test and a site download test.
Keywords/Search Tags:Web Crawler, Parallel, Search Engine
PDF Full Text Request
Related items