Font Size: a A A

A study of trigrams and their feasibility as index terms in a full-text information retrieval system

Posted on:1992-02-27Degree:D.ScType:Dissertation
University:The George Washington UniversityCandidate:Adams, Elizabeth ShawFull Text:PDF
GTID:1478390014498822Subject:Computer Science
Abstract/Summary:
The use of a trigram based index, as presented in this dissertation, is envisioned as a substitute for the usual word based index because of its ability to accomplish retrievals on text fragments effortlessly while not losing the full word retrieval capability found in word based systems. This ability is accomplished due to what is the most significant tradeoff between the two systems. This tradeoff is the reduction in index size in the trigram based model which is accompanied by a corresponding increase in posting list size. Another significant advantage to the trigram based model follows from the constancy of index size even in a continually increasing database. This limitation in index size allows a complete index to be maintained regardless of database size and thus does not require reorganization of the index or posting lists when users' interests change. The trigram based model, as investigated, performs morphologically based retrievals and thus transfers responsibility for selection of alternate phrases to be used in retrievals from indexers and system builders to system users.; Trigram and word distributions in several datasets were studied and comparisons described. Retrieval experiments to test trigram based retrieval performance were executed. Alternate performance measures to the standard recall and precision defined as exact match recall and exact match precision were used. The results show that retrievals based on trigrams yield exact match precision better than 85% and 100% exact match recall while needing no more time (as measured by disk accesses) than a word based system. A technique for improving the exact match precision to 100%, while not requiring additional disk accesses, is proposed.
Keywords/Search Tags:Index, Trigram, Exact match, System, Retrieval
Related items