Doubly ranked information retrieval and area search

Posted on:2006-10-13

Degree:Ph.D

Type:Dissertation

University:University of California, Los Angeles

Candidate:Cao, Yu Uny

Full Text:PDF

GTID:1458390008459820

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

We propose two extensions, Doubly Ranked Information Retrieval (DRIR) and Area Search, to the current generation of Ranked Information Retrieval (RIR).; Doubly Ranked Information Retrieval considers a collection of documents (e.g., a journal), where each document consists of a set of weighted terms. DRIR returns those terms and documents that are most "representative" of that collection. DRIR represents the collection as a document-term weight matrix and uses an iterative procedure that generates the two primary singular vectors of this matrix. One of these singular vectors is used to create a "signature" of the collection. The signature is composed of term-score pairs and is the key to selecting the most representative terms and documents.; To measure how representative the DRIR results are, an analytical metric, Representativeness Error, is proposed. It is shown that DRIR's signature is optimal for this metric. Further, we modify this metric to establish the Visibility Representativeness Error which takes into consideration how users read the top 10 search results presented to them. It is shown via our experiments on real and artificial data that DRIR's signature does well for both Representative Error and Visibility Representativeness Error.; Area Search is used to find those collections from a given set of document collections that best match a query. It first computes a signature for each of the collections. Once a user query is received, Area Search returns the best matching collections based on a similarity measure between the query and each of the signatures, and for each returned collection, its most representative terms and documents. The scheme for calculating signatures is at the center of Area Search; we use DRIR and compare its performance to that of two other signature schemes. Two metrics, Hits and WeightedSimilarity , are proposed that are reminiscent of the "recall" and "precision" metrics, respectively, in Ranked Information Retrieval. We show theoretic properties of the metrics. Our experiments on both real and artificial data show that DRIR's signature does better than the other signature schemes in regions of practical interest as defined by the limited amount of real estate at the human-machine interface.; We see that both DRIR and Area Search point to the next generation of search theory and technology, and are complementary to Web search, the prominent example of RIR.

Keywords/Search Tags:

Ranked information retrieval, Search, DRIR, Signature

PDF Full Text Request

Related items

1	Ranked search over structured and semi-structured data
2	Secure Ranked Search Based On Simhash Over Enervated Data
3	Research On Encrypted Data With Keyword Ranked Search
4	Research On Privacy-Preserving Multi-keyword Ranked Search Over Encrypted Cloud Data
5	The Effects of Index Storage on Ranked Information Retrieval
6	Research On Efficient Ranked Search Schemes Over Encrypted Cloud Data
7	Research On Multi-keyword Ranked Search Technology In Cloud Storage
8	Research Of Verifiable Ciphertext Ranked Search Based On Conjunctive Keywords
9	Privacy Preserved Data Retrieval Over Encrypted Data
10	Design And Implementation Of Encrypted File Retrieval System Based On Neural Network In Cloud Computing Environment