Font Size: a A A

TERM CONFLATION FOR INFORMATION RETRIEVAL

Posted on:1983-10-28Degree:Ph.DType:Thesis
University:Syracuse UniversityCandidate:FRAKES, WILLIAM BRUCEFull Text:PDF
GTID:2478390017463925Subject:Information Science
Abstract/Summary:
One of the fundamental assumptions underlying the design of information retrieval systems is that documents represented by the same terms that are used in a query will be relevant to the query. Simple matching of raw terms from queries and documents will, however, fail to relate variant forms of the same term such as singular and plural. In order to solve this problem, a number of conflation methods for reducing term variants to a common form have been proposed.;This study examined two questions concerning conflation. The first question regarded the role of root morphemes in the determination of stems for information retrieval. The thesis adopted here was that roots are the best stems for information retrieval purposes. The second question concerned the relative effectiveness of conflation carried out by a computer program, here called stemming, versus conflation carried out by searchers, here called truncation. The adopted thesis was that stemming will perform at least as well as truncation.;Two experiments were carried out to test these theses. In the experiment to test if roots are the best stems for information retrieval, roots were determined for truncated terms in title-abstract searches carried out by four experienced searchers on 39 queries. Average measures of positive and negative deviations of stems from their corresponding roots were calculated for each query, and tested for correlation with combined measures of recall and precision. It was determined that searchers do truncate at or near root boundaries, but that small deviations from root boundaries, such as were observed here, do not significantly negatively affect retrieval results.;In the experiment to test the thesis that stemming will perform at least as well as truncation, title-abstract searches carried out by four experienced searchers on 25 queries were re-executed using stemmer generated stems in place of the searcher generated stems. The performance results of these stemmed and truncated searches were then compared. No significant difference was found between stemming and truncation, supporting the acceptance of thesis two, and indicating that stemmers are capable of relieving searchers of the task of conflation.
Keywords/Search Tags:Information retrieval, Conflation, Term, Searchers, Thesis
Related items