Font Size: a A A

The Research And Implementation Of Main Arithmetic In Focused Crawler

Posted on:2006-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y HuFull Text:PDF
GTID:2168360152982655Subject:Computer applications
Abstract/Summary:PDF Full Text Request
With the swift development of Internet technology, there are more and more types of information that can be used on the World Wide Web (WWW). However, there are some problems associated with it. Firstly, web pages are more complex than any traditional text document. Secondly, the WWW is a highly dynamic information source. Thirdly, the WWW serves various types of people. Finally, only a small amount of information on the WWW is useful or relevant to each user's individual needs.All these problems drive the research work on the effective finding and use of information on the WWW. Focus crawling is put forward for solving these problems, as it provides different services to meet the needs of individual users.This paper is going to deal with two main algorithms in Focus crawling, these being web page filtration and URL ordering.This paper will propose a system filtering algorithm obtained by a DDBCUR clustering algorithm, which is able to filter the irrelevant web pages quickly.Previous research has proved that the distributing of web pages rest satisfied with two localities. Based on this, this paper will propose a URL ordering algorithm with adding learning. This algorithm is both simple and effective.The main new points in this paper are:On algorithms: Firstly, this paper proposes a clustering algorithm—DDBCUR, which is based on hierarchies and density clustering algorithms. Secondly, it will put forward a URL ordering algorithm that is simpler and faster.On programming: The development of an experimental intelligence Focus crawler system .
Keywords/Search Tags:Clustering, Web page filtration, Topic distillation, URL ordering, Focus crawling
PDF Full Text Request
Related items