Font Size: a A A

A computational model of word and sentence meaning

Posted on:2006-03-15Degree:Ph.DType:Dissertation
University:The University of MemphisCandidate:Ventura, MatthewFull Text:PDF
GTID:1455390008961590Subject:Psychology
Abstract/Summary:
One way cognitive scientists investigate language understanding is through distributional word meaning algorithms. These models rely on distributions of co-occurrence frequencies to construct relations between words from a record of language experience (e.g., large corpus of natural language). Using this approach, the meaning of a word is based on the presence of consistent features among a distribution of multiple instances. Two words that tend to occur in similar linguistic contexts will have distributional co-occurrence similarities. There is a growing body of research suggesting that distributional information plays a more powerful role than previously thought in a number of aspects of language processing. This dissertation proposes an algorithm for implementing contextual constraints among words by selecting the appropriate dimensions to use from corpus-based knowledge. The knowledge comes from a corpus with information captured in words, but the general algorithm could theoretically be used in any sensory modality. The general idea is that context can be created by first having a representation of individual words. These word meanings are built by their occurrence with other words in a corpus. As sentences are constructed, intersections between the meanings of the words form a constrained global representation that gives additional salience to particular dimensions that are constrained systematically by the context.; While there are many models of corpus based models of word meaning, the difficult challenge is capturing sentence meanings. The first two experiments evaluated the Neighborhood Intersection model and Latent Semantic Analysis in predicting sentence meanings rated by humans. In these experiments the Neighborhood Intersection model and Latent Semantic Analysis generated similarity scores among pairs of word strings (e.g., bear slept in cave - camel walked desert). Results reveal the Neighborhood Intersection model predicts human ratings comparable to Latent Semantic Analysis. These results suggest that the NI model is a sufficient model to be used in the field of natural language processing by using simple principles of word co-occurrence and feature overlap.; The third experiment investigated NI model performance as a function of corpus size. Results revealed that the NI model is more sensitive to corpus manipulations due to the strong interdependency among neighbors of target words in a context. The fourth experiment investigated how the NI model performance changes with carrying associative window size. A window is the boundary for what is counted as a co-occurrence between words in a corpus. Results revealed that smaller window sizes predict syntactic meanings while larger window sizes predict more semantic meanings. One advantage of Neighborhood Intersection model over other models of natural language understanding is it gives the freedom to add more information to the corpus in real time. Since the measures derived are computed on-line on the corpus, dynamically adding text to the corpus is not a problem. Essentially, many weights are changed between words as soon as text is added. This can be effective for tutoring systems like AutoTutor that rely on natural language understanding tools to assess user knowledge on various topics. In these situations the NI model can be used to constantly assess incremental knowledge gain by users by accumulating user feedback into a user corpus.
Keywords/Search Tags:Model, Word, Corpus, Meaning, Language, Latent semantic analysis, Sentence, Co-occurrence
Related items