Sifting and winnowing: Approaches to finding useful information on the web

Posted on:2004-03-17

Degree:Ph.D

Type:Dissertation

University:The University of Wisconsin - Madison

Candidate:Zeidenberg, Matthew

Full Text:PDF

GTID:1468390011971389

Subject:Computer Science

Abstract/Summary:

The Web has started a social transformation in which information flows more widely and could be made more reliable. But we need tools to make this flood of information more useful.; I argue that web search engines, directories, and collaborative filtering systems should be combined into unified systems based on the general concept of information filtering. Such filtering can be used with Web pages, user reputations, and anything subject to a price or a poll. An information filtering mechanism can supplement or perhaps supplant markets and other institutions.; Computers are best at processing large amounts of textual information quickly and making a good first guess on such attributes of a page as what category it belongs in, what other pages it is close to, and its rank relative to the other pages. Communities supporting user reputations are best at final judgments.; A system that combines a Web directory and search engine can be built by spidering off the initial directory pages and using a multi-resolution version of the Naive Bayes algorithm that classifies the new pages in a scalable manner. I also show that links can add valuable information about what category in which a page belongs.; By collecting human judgments about a set of pages, I find only a weak relationship between these judgments and a count of in-links to these pages. If human judgments are what one is looking for, there is no way to get such judgments that is better than gathering them directly, through collaborative filtering.; I experiment with various ways of clustering web pages, first by using links and pruning out highly-referenced pages, second by using the WordNet electronic lexicon, and third by using semantic networks constructed from the document text. All of these methods are successful, to varying degrees.; I build a linear system for collaborative filtering with an explicit relation between document and author. Ratings of documents reflect on their authors; and these influence the weight given to their rating of other documents. Such a system is shown to be stable after relaxation.

Keywords/Search Tags:

Information, Web, Pages

Related items

1	New And Intelligent Embedding Algorithm And New Techniques For Information Hiding In Web Pages
2	Design And Implementation Of Text Information Extracting Modules Of Html Web Pages Based On DOM
3	Study And Design Of Information Integration Model Based On Web Pages Content
4	Research Of Web Information Extraction Method Based On Multi-feature Mining
5	Research And Application Of The Urban Mobile Yellow Pages Information Directed Acquisition And Management Techniques
6	Research On The Technology Of Incremental Web Pages Crawler
7	Basic Education Website Yellow Pages System
8	Extracting Landslide Disaster Information From Web Pages
9	Research On Scheme Of Topic-Specific Web Pages Filtering
10	Research And Application Of Web Pages Denoising And Information Extraction Algorithm