Font Size: a A A

Research Of Topic-Specific Web Resource Discovery

Posted on:2006-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:G OuFull Text:PDF
GTID:2168360155961653Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Web crawler have exist for many years. The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers recently. It can not gather all data timely and it is hard to find out the useful information. So the focused web crawler becomes the focus research. The goal of it is to selectively seek out pages that are relevant to a set of topics. It can improve the crawler's performance, leads to savings in hardware and network resources.In this paper we introduce the uses, history, actuality and future of the focused web crawler, analyse the popular algorithm and distribution of the pages that are relevant to a topic in the web. Build a focused crawler with Java and SQL Server 2000.Collect seeds from web based on metasearch engine theory. Simplify the information filtering through providing comprehensive and exact URL of web site and realize the high effective information crawling. We also give the solution to problems met in analyzing HTML syntax and file filtering. Finally, we make a summary of the capability and the future of the system.The experiment result show that the work is effective and our...
Keywords/Search Tags:Web Resource Gathering, Topic, Search Engine, Seed
PDF Full Text Request
Related items