Research On Web Crawler Technology In Search Engine

Posted on:2010-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Guo

Full Text:PDF

GTID:2178330332988356

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Along with the development of Internet and exponential growth of web information, search engine has become an indispensable tool for people to fetch information. For most search engines, how to use the limit system resources to collect pages effectively and efficiently has come to be a hot area in this search field. This paper explores a web crawler system, and does a deep study on the core algorithms of the system.This paper firstly analyzes the principle and the architecture of search engine, discusses the fetching strategy of web crawler, puts forward an improved fetching strategy based on page depth and weighted back-link count; Secondly, some critical algorithms are designed, for example multi-threaded web crawling, elimination of duplicate URL, scheduling strategy of web pages and so on.Besides, considering the character of Chinese search engine, a conversion for Chinese characters code is given to achieve the unification storage. Moreover, DNS cache mechanism is applied to speed up the collection pace. Last, Incremental crawling mechanism is applied to reduce the cost of time and resources when collecting the web pages which are not changed in the fetching circle.The experimental results show that the performance of the web crawler system has met the search engine requirements for mass data-processing.

Keywords/Search Tags:

Web Crawler, Search Engine, Information Retrieval

PDF Full Text Request

Related items

1	Research On Web Crawler Technology In Search Engine
2	Architecture And Optimization Of Prallel Crawler
3	The Research And Realization Of Vertical Search Engine System Based On Nutch For Medicine
4	Research Of Intranet Information Supervision System Based On Net Crawler And Full-text Search Engine
5	Design And Realization Of The Search Engine System For Campus Network
6	Research On Data Store Of Search Engine
7	Research And Implementation Of A Time-based Focused Search Engine
8	Research And Implementation Of Vertical Search Engine Based On Distributed High-Precision Collector
9	Research Of A Distributed Web Crawler Search Engine Based On Web Information Collection
10	Research And Application Of Focusing Crawler Which Faced Vertical Search Engine