Font Size: a A A

Design And Implementation Of Inar Web Crawler

Posted on:2007-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:L B LinFull Text:PDF
GTID:2178360212467039Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the fast development of Internet, information available from the web is increasing and extending in an explosive rate. The problem of how to find the information they want quickly and effectively from the ocean of websites always confuses us. To address this issue, Search Engines emerge, as the times require. Via Search Engines, users can shuttle through web pages of different sites and locations, obtaining diverse useful information. Web Crawler plays a very important role in the whole Search Engine system. Being the data source of Internet Search Engines, it determines the content diversity and timely information update of the system.This thesis first introduces the categories and constitution of Search Engines, and then gives a brief overview of their inner operational mechanism. Then we give an overall introduction to the running process of common Web Spiders. Next, we analyze the search strategies and primary technological problems faced by Web Spiders briefly; then we further analyze their inner-structure through three specific examples.Finally, a detailed analysis about the system architecture and implementation of a Web Crawler named"Inar"(Information Navigation And Retrieval) is given. The research content mainly consists of the following aspects:(1) Based on the analysis of common Web Crawlers, we put forward the system architecture of"Inar", and then we expound its core inner-constitution through the explanation of the key data structures of this system.(2) After a formulation of the main modules of"Inar", i.e., URL Scheduler ,DNS Resolver, Connecting,Asyn I/O,HTML,URL Filter, we give the implementation steps of"Inar"in detail on Linux platform using C/C++ technology.(3) We analyze the design of the updating strategy of"Inar", put forward a mechanism for locating and updating Web Crawlers, making them function more independently and effectively.(4) We give an analysis of the data used in our experiment. Through...
Keywords/Search Tags:web, web crawler, asyn I/O, single thread
PDF Full Text Request
Related items