Font Size: a A A

Automatic construction of a hypernym-labeled noun hierarchy from text

Posted on:2002-06-02Degree:Ph.DType:Thesis
University:Brown UniversityCandidate:Caraballo, Sharon AnnFull Text:PDF
GTID:2468390011498918Subject:Computer Science
Abstract/Summary:
Many language processing tasks are dependent on large databases of lexical semantic information, such as WordNet. These hand-built resources are tremendously time-consuming to create and may be lacking in coverage. They may be particularly inappropriate for text from a single domain, both because domain-specific terms are missing and because the lexicon contains many words or meanings which would be extremely rare in that domain. This thesis describes statistical techniques to automatically extract semantic information about words from text; specifically, given a large corpus of text and no additional sources of semantic information, we build a hierarchy of nouns appearing in the text. The hierarchy is in the form of an IS-A tree, where the nodes of the tree contain one or more nouns, and the ancestors of a node contain hypernyms of the nouns in that node. (An English word A is said to be a hypernym of a word B if native speakers of English accept the sentence “B is a (kind of) A.”) The techniques presented here could be used in the construction of updated or domain-specific semantic resources as needed. The methods described here provide a substantial improvement over previously published results; while we could previously produce a hierarchy whose internal nodes were judged to be correct hypernyms for 33% of the nouns beneath them, we can now achieve 56% on this measure. The thesis also includes a detailed discussion of a particular subproblem: determining which of a pair of nouns is more specific. We identify numerical measures which can be easily computed from a text corpus and which can answer this question with over 80% accuracy.
Keywords/Search Tags:Text, Semantic information, Hierarchy
Related items