Font Size: a A A

Word-sense disambiguation for large text databases

Posted on:1996-11-29Degree:Ph.DType:Dissertation
University:University of Massachusetts AmherstCandidate:Krovetz, Robert JeffreyFull Text:PDF
GTID:1468390014985770Subject:Computer Science
Abstract/Summary:
Most retrieval systems represent documents and queries by the words they contain, and rank documents based on the words in common with the query. Because words are ambiguous, this can cause documents to be retrieved that are not relevant. In addition, a document can be relevant even if it does not mention the exact words used in the query. A user is generally not interested in the words, but in the concepts that those words represent. We report on an analysis of lexical ambiguity in information retrieval test collections, and on experiments to determine the utility of word meanings for separating relevant from non-relevant documents.; Our research has examined different sources of evidence for distinguishing meanings. These sources can serve as a mechanism for splitting meanings apart, as well as bringing them together. For example, morphology separates author/authorize, and universe/university; it brings together burglar/burglarize and sincere/sincerity. Similar distinctions hold for phrases and part of speech. Any effort to deal with word meanings and information retrieval must take these distinctions into account.; We discuss the results of experiments with these sources of evidence, and our proposals for future work.
Keywords/Search Tags:Words, Documents
Related items