Font Size: a A A

Research And Implementation Of Web Directory And Link Relationship Based Spider Crawling Strategy

Posted on:2010-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y K LiuFull Text:PDF
GTID:2178360278980489Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet ,information provided by the Internet have shown explosive growth. In the face of massive and constantly updated information on the Internet, search engines provide a quick and easy for people to find information the way.Through the use of search engines, users can browse the search results page on the face of Web information growing geometrically , how the user can fast access to more valuable and more information has become one of the hot spots .Network Robot Spider is an important component of the search engine it , it determines the quality of content of the entire search engine system,determines if the pages can be updated in time.This paper start from the development and the sorts of search engine, studied the structure of search engine and the composition of the spider,focused on analyzing the spider crawling strategy which based on high quality Web pages.through studying the structure of Web and analyzing the sorts of links,designed a new spider crawling strategy which focused on both the high quality Web pages and the potential high quality Web pages.The main research contents mainly include the following:1. Through the analysis on the structure of normal spider and the study in Jeff Heaton Spider, design the structure of spider using the strategy designed in this paper.2. Analyz and study some kinds of spider crawling strategy which based on the quality of the Web pages.3. Analyz the structure of the Web, and the sorts of links, design a new spider crawling strategy which focused on the both the high quality and the potential high quality Web pages.4. Through the analysis of the experiment,and the comparision with the Backlink strategy,prove the feasibility and necessity of the spider crawling strategy designed in this paper5. Sum up and give the analysis and a simple outlook of the next step of the subject.
Keywords/Search Tags:Hyperlink analysis, Web directory, Spider, High-quality web pages
PDF Full Text Request
Related items