Font Size: a A A

Reference directed indexing: Indexing scientific literature in the context of its use

Posted on:2003-11-06Degree:Ph.DType:Dissertation
University:Northwestern UniversityCandidate:Bradshaw, Shannon GlennFull Text:PDF
GTID:1468390011486189Subject:Computer Science
Abstract/Summary:
A search engine is only as good as the degree to which it provides people with useful information. Researchers in Information Retrieval (IR) have worked toward this goal by developing measures of query relevance. These techniques determine relevance on the basis of statistical measures of the frequency with which query terms are used in documents. Unfortunately, these techniques while good at measuring relevance, often poorly identify information that is actually useful. To be useful a document must be more than simply relevant to a query, it must have something of interest to say about the topic in question. In recent years, other researchers have developed techniques that determine utility on the basis of some measure of the popularity of a document. The Google Internet search engine is an example of this approach. These systems, because they are more concerned with popularity than relevance, regularly identify a few useful documents, but many that are irrelevant. Traditional IR approaches then, accurately determine relevance but not utility, while popularity approaches accurately determine utility but not relevance. In this dissertation, I present an approach that builds on both techniques to combine measures of relevance and utility in a single metric. This technique, called Reference Directed Indexing (RDI) overcomes many of the problems with traditional IR techniques and popularity approaches. I have implemented RDI in a retrieval system for scientific literature called Rosetta. Rosetta compares multiple references to documents to determine what documents are about and the degree to which they are useful. In response to queries it provides information seekers with the documents to which the greatest number of authors have referred using the words in their query. Relying on what referrers have to say about documents has proven to be a highly effective means of determining what documents are about, and which documents on a topic are most useful. In addition to a retrieval system, I have also developed a fully automated Collaborative Query Interface (CQI) based on RDI. The CGI helps users explore an information space and resolve query ambiguity by suggesting related topics and ways of augmenting their queries.
Keywords/Search Tags:Information, Useful, Query, Indexing, Documents
Related items