Font Size: a A A

Connecting link structure and content on the Web for effective focused crawling

Posted on:2004-07-18Degree:M.C.ScType:Thesis
University:Dalhousie University (Canada)Candidate:Nickerson, Adam StuartFull Text:PDF
GTID:2458390011955901Subject:Computer Science
Abstract/Summary:
One approach in designing domain specific search engines and personal information retrieval agents is to use a focused web crawler. Focused web crawlers try to follow links that are most likely to lead to relevant documents. Previous focused crawlers have done this by taking advantage of the content within a web page and the link structure of the World Wide Web, but fail to capture the systematic relationships between pages on either side of hyperlinks.; The relationships between content on either side of hyperlinks are studied. A model for language distribution in technical documents is shown to be insufficient to model the variety of documents found on the Web.; A word graph is defined which is a projection of both the information within and the links between web pages. The implications of the observed relationships between the link structure and content on the Web with respect to using word graphs for focused crawling in practice are discussed.
Keywords/Search Tags:Focused, Structure and content, Link structure, Information
Related items