Research Of Topic-Specific Web Resource Discovery

Posted on:2006-05-29

Degree:Master

Type:Thesis

Country:China

Candidate:G Ou

Full Text:PDF

GTID:2168360155961653

Subject:Computer application technology

Abstract/Summary:

Web crawler have exist for many years. The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers recently. It can not gather all data timely and it is hard to find out the useful information. So the focused web crawler becomes the focus research. The goal of it is to selectively seek out pages that are relevant to a set of topics. It can improve the crawler's performance, leads to savings in hardware and network resources.In this paper we introduce the uses, history, actuality and future of the focused web crawler, analyse the popular algorithm and distribution of the pages that are relevant to a topic in the web. Build a focused crawler with Java and SQL Server 2000.Collect seeds from web based on metasearch engine theory. Simplify the information filtering through providing comprehensive and exact URL of web site and realize the high effective information crawling. We also give the solution to problems met in analyzing HTML syntax and file filtering. Finally, we make a summary of the capability and the future of the system.The experiment result show that the work is effective and our...

Keywords/Search Tags:

Web Resource Gathering, Topic, Search Engine, Seed

Related items

1	Design & Practice Of Topic-Specific Search Engine System
2	Research On Content Search Engine Based On The Topic Relevance Routing In P2P Networks
3	Research And Implementation On Key Techniques Of Topic Search Engine
4	Research And Implementation Of Meta-search Engine Based On Specialized Search Engine
5	The Research And Implementation On Lucene-Based Topic Search Engine
6	Design And Implementation Of Topic Search Engine Based On The Circuit Curriculum
7	The Research And Application Of Topic-specific Search Engine
8	Design And Realization Of News Search Engine Based On Java
9	The Meta Search Engine Research On Topic Distillation Algorithms
10	Research And Realization On Correlation Techniques Of Topic Search-specific Engine