Toward knowledge-free induction of machine-readable dictionaries

Posted on:2002-09-04

Degree:Ph.D

Type:Dissertation

University:University of Colorado at Boulder

Candidate:Schone, Patrick John

Full Text:PDF

GTID:1468390011491373

Subject:Computer Science

Abstract/Summary:

Machine-readable dictionaries (MRDs) have found uses in many natural language tasks. Current MRDs are typically generated either by the expensive processes of digitization from hard-copy dictionaries or construction by hand. It would be valuable if MRDs could be built automatically from a corpus of text. Moreover, if induction were knowledge free, it could be applied across ages, across domains, and perhaps even to non-language problems.; In this research, we focus on three major subtasks of language-independent, near-knowledge-free induction of MRDs. In particular, we concentrate on inducing dictionary headwords, identifying language morphologies, and clustering and labeling parts of speech. However, unlike past research efforts, our algorithms make use of no language-specific information and most, in fact, are completely knowledge-free.; To induce dictionary headwords, we identify nine currently available algorithm for multiword unit selection and, based on extensive comparisons to both static and dynamic gold standards, we isolate the algorithms that are best for our task. We then explore semantic non-compositionality and non-substitutivity as means of improving performance. We find that non-substitutivity provides modest improvements. Since not all languages are amenable to multiword unit processing, we also identify segmentation strategies and show hoe to enhance those strategies.; Next we introduce a new knowledge-free methodology for automatic acquisition of morphology. This approach makes use of character trees, singular value decomposition, induced syntactic constraints, orthographic information, and transitivity. We measure its performance in German, Dutch, and English and show that it outperforms other existing knowledge-free algorithms.; Using this morphological information and distributional information, we introduce a novel approach to clustering words based on syntactic usage. We proceed to automatically affix actual part-of-speech tags to those clusters without using any training data or lexicons. We couple language universals (our sole human input) with features extracted from the clusters and tag the clusters using a probabilistic framework.; A number of algorithms already exist for finding semantic relationships. We therefore conclude by describing such algorithms and by discussing how the components we have induced could be combined with existing strategies in order to yield actual definitions.

Keywords/Search Tags:

Knowledge-free, Induction, Mrds

Related items

1	Research On Control Technology Of Free Induction Heating Based On STM32 Microcontroller
2	Research On Simulaiton Of7-DOF Welding Robot Based On MRDS
3	Automatic Construction Of Finanical Domain Knowledge Graph With Template Induced Method
4	Research On The Impact Of Providing Paid Knowledge Service On Users' Free Knowledge Contribution:Evidence From An Online Q&A Community
5	A new approach of top-down induction of decision trees for knowledge discovery
6	Research And Implemention Of Array Induction Tool Based On WTS
7	Taxonomy Induction Research On Knowledge Base From Chinese Encyclopedia
8	From Free Sharing To Payment Transmission
9	Design And Implementation Of Staff Induction Service System For Shengjing Bank Of Anshan
10	Gene expression programming and rule induction for domain knowledge discovery and management