The Research And Implementation Of Main Arithmetic In Focused Crawler

Posted on:2006-06-25

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Hu

Full Text:PDF

GTID:2168360152982655

Subject:Computer applications

Abstract/Summary:

PDF Full Text Request

With the swift development of Internet technology, there are more and more types of information that can be used on the World Wide Web (WWW). However, there are some problems associated with it. Firstly, web pages are more complex than any traditional text document. Secondly, the WWW is a highly dynamic information source. Thirdly, the WWW serves various types of people. Finally, only a small amount of information on the WWW is useful or relevant to each user's individual needs.All these problems drive the research work on the effective finding and use of information on the WWW. Focus crawling is put forward for solving these problems, as it provides different services to meet the needs of individual users.This paper is going to deal with two main algorithms in Focus crawling, these being web page filtration and URL ordering.This paper will propose a system filtering algorithm obtained by a DDBCUR clustering algorithm, which is able to filter the irrelevant web pages quickly.Previous research has proved that the distributing of web pages rest satisfied with two localities. Based on this, this paper will propose a URL ordering algorithm with adding learning. This algorithm is both simple and effective.The main new points in this paper are:On algorithms: Firstly, this paper proposes a clustering algorithmâ€”DDBCUR, which is based on hierarchies and density clustering algorithms. Secondly, it will put forward a URL ordering algorithm that is simpler and faster.On programming: The development of an experimental intelligence Focus crawler system .

Keywords/Search Tags:

Clustering, Web page filtration, Topic distillation, URL ordering, Focus crawling

PDF Full Text Request

Related items

1	Key Technology Research On Web Forums Crawling And Hot Topic Detection
2	Research On Topic Web Page Crawling Strategy For Vertical Search Engine
3	Study On Focused Crawling Technique For Vertical Search Engine
4	Vertical Search Engine For Crawling The Web Page Design And Implementation
5	Research And Implementation Of Focus Crawling Spider Based On A. T. C And Optimzied Hyperlink Chosen Strategy
6	The Theme Of The Search Engine Web Spider Search Strategy Study
7	Research On Technology Of Software Component Obtaining From The Internet
8	Research On Efficient Web Information Crawling Strategy
9	Research And Implementation Of Network Scanning Technology Based On Intelligent Crawling Algorithm
10	Research On Focused Crawling Technique For Vertical Search Engine