Font Size: a A A

A Parallel System Of Incremental Web Information Retrieval

Posted on:2006-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2168360155970790Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With information rapidly expanding in the Web, many Web services accordingly boom up. As a basic foundation and important component of these services, Web crawling is applying in many fields, such as search engine, site structure analysing, and web graph evolution,. However, facing with people requesting more and more rigorous and prolific, traditional scalable Web crawling technology do not satisfying people's needs well. It can not gather data adequately and timely. Thus, we get into the research on how to crawl information effectively in some sections of Web, which is also called parallel web crawling technology. Based on the long-time accumulation in the field of web crawling, and combining the current developing technology on parallel web crawling, this article bring forward a structure design model of the parallel incremental web crawler, In order to downloading web pages parallelly, we adopt means of multiple thread. We adopt the latest character of JAVA language. We adopt the right means for URL dispatching to make sure that threads would parally work, through page analysis, we extract url for threads to download, In order to reduce redundancy ,we chose footprint algorithm. At last, we get the test result, Within our expect, It can effectively improve information gathering performance.
Keywords/Search Tags:Web, Information Crawling, Information Gathering, Search Engine, Parallel
PDF Full Text Request
Related items