Font Size: a A A

A classification approach to the automatic reformulation of Boolean queries in information retrieval

Posted on:1994-08-12Degree:Ph.DType:Dissertation
University:University of VirginiaCandidate:Kim, Nam-HoFull Text:PDF
GTID:1478390014494047Subject:Computer Science
Abstract/Summary:
One of difficulties in using the current Boolean-based information retrieval systems is that it is hard for a user, especially a novice, to formulate an effective Boolean query. Users usually employ a trial-and-error search and often have to rely on an expert (e.g., librarian) in searching for the right information. Query reformulation can be even more difficult and complex than formulation since the user has greater difficulty in incorporating the new information gained from the previous search into his next query. In this research, query reformulation is viewed as a classification problem classifying documents as either relevant or nonrelevant), and a new reformulation algorithm is proposed which builds a tree-structured classifier (named the query tree) at each reformulation from a set of feedback documents retrieved from the previous search; the query tree can be easily transformed into a Boolean query.; To compare the performance of the new approach and past Boolean query reformulation algorithms, an evaluation testbed was developed. Its major component is a simulated database which is characterized by the term frequency distributions and an artificial Boolean query; the relevance of documents to a query is judged by the system. The query tree and two of the most important current query reformulation algorithms were compared on benchmark test sets (CACM, CISI, and Medlars) and in an evaluation testbed. The query tree showed significant improvements over the current algorithms in most experiments. We attribute this improved performance to the ability of the query tree algorithm to select good search terms and to represent the relationships among search terms into a tree structure.
Keywords/Search Tags:Boolean, Query, Information, Reformulation, Search
Related items