Automatic construction of a hypernym-labeled noun hierarchy from text

Posted on:2002-06-02

Degree:Ph.D

Type:Thesis

University:Brown University

Candidate:Caraballo, Sharon Ann

Full Text:PDF

GTID:2468390011498918

Subject:Computer Science

Abstract/Summary:

Many language processing tasks are dependent on large databases of lexical semantic information, such as WordNet. These hand-built resources are tremendously time-consuming to create and may be lacking in coverage. They may be particularly inappropriate for text from a single domain, both because domain-specific terms are missing and because the lexicon contains many words or meanings which would be extremely rare in that domain. This thesis describes statistical techniques to automatically extract semantic information about words from text; specifically, given a large corpus of text and no additional sources of semantic information, we build a hierarchy of nouns appearing in the text. The hierarchy is in the form of an IS-A tree, where the nodes of the tree contain one or more nouns, and the ancestors of a node contain hypernyms of the nouns in that node. (An English word A is said to be a hypernym of a word B if native speakers of English accept the sentence “B is a (kind of) A.”) The techniques presented here could be used in the construction of updated or domain-specific semantic resources as needed. The methods described here provide a substantial improvement over previously published results; while we could previously produce a hierarchy whose internal nodes were judged to be correct hypernyms for 33% of the nouns beneath them, we can now achieve 56% on this measure. The thesis also includes a detailed discussion of a particular subproblem: determining which of a pair of nouns is more specific. We identify numerical measures which can be easily computed from a text corpus and which can answer this question with over 80% accuracy.

Keywords/Search Tags:

Text, Semantic information, Hierarchy

Related items

1	Research Of Text Mining About Semantic Relation Recognition
2	Research On Text Semantic Orientation Analysis For Areas Of Applied Based On The Web Information
3	A Research On Text Analysis And Representation Based On Semantic Infomation
4	Sensitivity of Semantic Signatures in Text Mining
5	Research On Ontology-Based Semantic Text Categorization
6	Hierarchical Semantic Structure Based Text Stream Mining
7	Research On The Hierarchy Visualization Of Incremental Text Clustering
8	Researching Text Classification Using Semantic And Sequence Information
9	The Representation Of Chinese Semantic Knowledge And Its Application In The Chinese-English MT System
10	Researches On Technologies Of Multi-Hierarchy Semantic Video Object Description Model And Extraction