Font Size: a A A

Doubly ranked information retrieval and area search

Posted on:2006-10-13Degree:Ph.DType:Dissertation
University:University of California, Los AngelesCandidate:Cao, Yu UnyFull Text:PDF
GTID:1458390008459820Subject:Computer Science
Abstract/Summary:PDF Full Text Request
We propose two extensions, Doubly Ranked Information Retrieval (DRIR) and Area Search, to the current generation of Ranked Information Retrieval (RIR).; Doubly Ranked Information Retrieval considers a collection of documents (e.g., a journal), where each document consists of a set of weighted terms. DRIR returns those terms and documents that are most "representative" of that collection. DRIR represents the collection as a document-term weight matrix and uses an iterative procedure that generates the two primary singular vectors of this matrix. One of these singular vectors is used to create a "signature" of the collection. The signature is composed of term-score pairs and is the key to selecting the most representative terms and documents.; To measure how representative the DRIR results are, an analytical metric, Representativeness Error, is proposed. It is shown that DRIR's signature is optimal for this metric. Further, we modify this metric to establish the Visibility Representativeness Error which takes into consideration how users read the top 10 search results presented to them. It is shown via our experiments on real and artificial data that DRIR's signature does well for both Representative Error and Visibility Representativeness Error.; Area Search is used to find those collections from a given set of document collections that best match a query. It first computes a signature for each of the collections. Once a user query is received, Area Search returns the best matching collections based on a similarity measure between the query and each of the signatures, and for each returned collection, its most representative terms and documents. The scheme for calculating signatures is at the center of Area Search; we use DRIR and compare its performance to that of two other signature schemes. Two metrics, Hits and WeightedSimilarity , are proposed that are reminiscent of the "recall" and "precision" metrics, respectively, in Ranked Information Retrieval. We show theoretic properties of the metrics. Our experiments on both real and artificial data show that DRIR's signature does better than the other signature schemes in regions of practical interest as defined by the limited amount of real estate at the human-machine interface.; We see that both DRIR and Area Search point to the next generation of search theory and technology, and are complementary to Web search, the prominent example of RIR.
Keywords/Search Tags:Ranked information retrieval, Search, DRIR, Signature
PDF Full Text Request
Related items