Font Size: a A A

Url-based Analysis Of The Thematic Network Robot

Posted on:2010-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:H HouFull Text:PDF
GTID:2208360275983496Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
As an important part of search engine, web robot can automatically download web pages from Internet. It helps search engine gather web pages. Web robot starts work with some seed links, and then it tranverse the whole web. However, topical web robot is not only a tool that downloads pages from web, but also can recognize topical relevece of links and the content of web pages. The main goal of topical web robot is not only to fulfill the recall rate, but also to improve the precision rate, providing search engine with a topical web warehouse. Nowadays, topical web robot, a important developing direction of search engine, has become a research hot spot in the domain of search engine technology .The main characteristics and the research work of this article are as follows:1. Improved FICA algorithm is introduced that can quickly and simply sort the importance of a simple URL on the same level, so the robot can access more important pages with a higher priority as soon as possible;2. Improved Sydney Strategy algorithm is introduced that can not only effectively control the queue in the number of temporary URL and to a large extent guarantee the coverage, but also effectively make use of the characteristics that adjacent URL are relevant to the same subject;3. New tunneling method is introduced that, under the premise of ensuring the topical Web robot access rapidly to subject page, using a sub thread to make a traversal of non-subject URL obtain from the main thread, then select relevant URL out of the outcome, and then hang them back to the main thread;4. In this paper, improved FICA algorithm, improved Sydney Strategy algorithm, KNN algorithm and the proposed tunneling method make up of a fast, efficient, intelligent topical Web robot system , topical Web robot system based on URL analysis, and we introduce details of it's overall design flow, the system structure and thread design, based on the system, we also introduce several important modules and key technologies.
Keywords/Search Tags:tocpical web robot, tunneling method, URL analysis
PDF Full Text Request
Related items