The Construction Of Mongolian Homographs Knowledge Base

Posted on:2011-12-16

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Q Shu

Full Text:PDF

GTID:1115360305991165

Subject:Chinese Ethnic Language and Literature

Abstract/Summary:

According to the statistics, Mongolian homographs account for 18% of dictionary in static circumstance, and account for 55% of corpus in dynamic circumstance. A comprehensive and systematic study for the homographs play important roll in the Mongolian language teaching and lexicography. The recognition of the homographs is the bottleneck of pronunciation recognition, morphological analysis, Part of Speech tagging and semantic tagging in Mongolian information processing. In this research, the author constructs the Mongolian homographs knowledge base which includes the following parts:homographs electronic dictionary; one million words corpus of manually recognized and tagged homographs; homographs' collocation base, co-occurrence base and synonym base; the management and maintenance tool of the homographs electronic dictionary; the statistical tool of the co-occurrence components; an automatic recognition tool of the homographs. Homographs knowledge base is an organic part of the Mongolian comprehensive knowledge base.This paper consists of introduction and six chapters:In the introduction, explain the object of the research, terminology, the research profile, significance, steps, methods and the source of the material.In the first chapter, discuss the relationship between the homographs and the homonyms, the homographs and the conversion words, the homographs and polysemy, then sum up the types and the sources of the homographs.In the second chapter, exposit the process of developing the homographs electronic dictionary in detail, including word sources, the principles and methods of the word selection, the attribute fields and value's specifications and so on.In the third chapter, mainly introduces the structures, functions, characteristics and existing problems of the management and maintenance tool.In the fourth chapter, briefly introduces the process of the training set constructionâ€”the process of recognizing and tagging homographs manually in one million words corpus. Then estimate the distribution of the homographs in Mongolian corpus.In the fifth chapter, based on the dictionary constructs the homographs' collocation base, co-occurrence base and synonym base; based on the corpus calculates the various statistical values of the homographs' co-occurrence components in one million words corpus.In the sixth chapter, implement the homographs automatic recognition based on the collocation base and the co-occurrence base, the test shows that the recall rate reaches 99.8% with precision rate of 81.7%. Then analyze the test results in detail.

Keywords/Search Tags:

Mongolian, homographs, knowledge base, language resources

Related items

1	The Construction Of Cyrillic Mongolian Homographs Knowledge Base
2	The Construction Of Mongolian Chinese English Language Knowledge Base
3	Design And Implementation Of The Management Platform For Mongolian Knowledge Base
4	Construction And Development Of Modern Mongolian Adverb Knowledge Base Based On Corpus
5	Construction And Research On The Knowledge Base Of The Mongolian Three Word Sets
6	The Construction Of The Knowledge-Base For Mongolian Conjunctive Form
7	Research On Automatic Recognition For Base Verb Phrases In Mongolian Language
8	Analysis Of Language Features Of English Abstracts And Construction Of A Knowledge Base
9	The Construction Of Chinese Morpheme Words Knowledge Base And Its Application In Understanding Unregistered Words
10	Research On The Prototype System Of Sino-Tibetan Cross-language Tourism Field Relationship Extraction And Knowledge Base Construction