Font Size: a A A

A statistical method for word-sense disambiguatio

Posted on:1996-08-10Degree:Ph.DType:Dissertation
University:New Mexico State UniversityCandidate:Bruce, Rebecca FrancesFull Text:PDF
GTID:1468390014986639Subject:Computer Science
Abstract/Summary:
In this dissertation, I apply statistical techniques to the formidable natural language processing task of word-sense disambiguation. In particular, I develop probabilistic classifiers--systems that perform disambiguation by assigning, out of a set of word meaning designations, the one that is most probable according to a probabilistic model. The model expresses the relationships among the classification variable (in this case, the variable representing the sense tag of the ambiguous word) and variables that correspond to properties of the ambiguous word and the context in which it occurs (the non-classification variables).;Statistical approaches to natural language processing are typically limited to simple models that include only a small number of immediately surrounding non-classification variables. The work in this dissertation addresses this limitation. I present a procedure for automatic model selection that makes use of a richer class of probabilistic models than is typically used in natural language processing, along with a technique for fitting such models to the data. That is, rather than making assumptions about which non-classification variables to use and how they are related, a procedure for using statistical techniques to answer these questions is described. Further, the types of models used in this work can express complex relationships among diverse sets of variables. These contributions are particularly important for word-sense disambiguation, where a tremendous number of non-classification variables, and interactions among them seem potentially relevant.;The claims made in formulating this procedure for model selection are supported by experimental verification. In total, I develop and test word-sense classifiers for twelve words: four nouns, four verbs and four adjectives. Each of these words is disambiguated with respect to the full set of sense distinctions provided in the Longman Dictionary of Contemporary English.
Keywords/Search Tags:Word, Statistical, Natural language processing, Non-classification variables
Related items