| According to the statistics, Mongolian homographs account for 18% of dictionary in static circumstance, and account for 55% of corpus in dynamic circumstance. A comprehensive and systematic study for the homographs play important roll in the Mongolian language teaching and lexicography. The recognition of the homographs is the bottleneck of pronunciation recognition, morphological analysis, Part of Speech tagging and semantic tagging in Mongolian information processing. In this research, the author constructs the Mongolian homographs knowledge base which includes the following parts:homographs electronic dictionary; one million words corpus of manually recognized and tagged homographs; homographs' collocation base, co-occurrence base and synonym base; the management and maintenance tool of the homographs electronic dictionary; the statistical tool of the co-occurrence components; an automatic recognition tool of the homographs. Homographs knowledge base is an organic part of the Mongolian comprehensive knowledge base.This paper consists of introduction and six chapters:In the introduction, explain the object of the research, terminology, the research profile, significance, steps, methods and the source of the material.In the first chapter, discuss the relationship between the homographs and the homonyms, the homographs and the conversion words, the homographs and polysemy, then sum up the types and the sources of the homographs.In the second chapter, exposit the process of developing the homographs electronic dictionary in detail, including word sources, the principles and methods of the word selection, the attribute fields and value's specifications and so on.In the third chapter, mainly introduces the structures, functions, characteristics and existing problems of the management and maintenance tool.In the fourth chapter, briefly introduces the process of the training set construction—the process of recognizing and tagging homographs manually in one million words corpus. Then estimate the distribution of the homographs in Mongolian corpus.In the fifth chapter, based on the dictionary constructs the homographs' collocation base, co-occurrence base and synonym base; based on the corpus calculates the various statistical values of the homographs' co-occurrence components in one million words corpus.In the sixth chapter, implement the homographs automatic recognition based on the collocation base and the co-occurrence base, the test shows that the recall rate reaches 99.8% with precision rate of 81.7%. Then analyze the test results in detail. |