Font Size: a A A

Research On Cyrillic And Mongolian Script’s Morphology And Conversion System

Posted on:2015-03-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:G B T E AoFull Text:PDF
GTID:1268330428482695Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Although Mongolian people have used several scripts in their historical period, they use three main scripts such as Traditional Mongolian script, Cyrillic Mongolian and Tod scripts.In this thesis, we demonstrated morphological and script’s conversion between two types of Mongolian such as Traditional Mongolian script and Cyrillic Mongolian. In introduction part, we showed significance of research work in detail. And also, you can see the aim and objective of research work in introduction. Countries, which have understood that language processing industry is critical in creating next generation of knowledge based, knowledge processing computers, have supported this industry greatly by public policy, established national level research centers and implemented many national level projects which require a lot of capital. Coordinating Mongolian studies with modern technology and developing Mongolian computational linguistics are topical requirements.Recognizing Mongolian word and sentence in computer helps to reveal and study Mongolian principle and feature thanks to modern approaches and technologies. That is, our further research work will be effective as a result of this work. Even though, some Mongolian companies and individuals have done research and analysis, and created some applications and programs in this industry, it is dissatisfactory compared to the level of other countries. Furthermore, we haven’t created unified system yet for the industry.Thus, I chose processing Mongolian using computer as main subject of thesis.In this work we tried to do morphological analyze both in Cyrillic Mongolian and Traditional Mongolian script and define inflection method of affix in accordance to orthography rule using computer. The aim of this work is to convert from Cyrillic Mongolian text to Traditional Mongolian script and vice versa. This process runs in following steps:First, to do morphological analyze in Cyrillic Mongolian and Traditional Mongolian word, find out stem and affixes of and then convert them to Traditional Mongolian and Cyrillic Mongolian script. Then join converted word stem with affix and generate word Traditional Mongolian script. This combined process is belonged to morphology of computational linguistics. Word which is written differently due to its meaning in Traditional Mongolian script is the same in Cyrillic script. Thus, we intended to define the meaning of word. In the frame of research work, we executed following activities.1. We demonstrated feature of both Cyrillic Mongolian and Traditional Mongolian script, Mongolian parts of speech and word structure. Traditional Mongolian script is a type of phonetic script and there are many words which have the same tones. It observes the principles of morphology and the traditions. The Cyrillic Mongolian script observes the principles of phonetics and it has the disadvantage of not observing the other principles.For computational linguistics, Traditional Mongolian script and Cyrillic Mongolian may have both same features. Contrariwise, there are large numbers of different features in both two scripts. For orthography, they may be similar in some ways. Because scientists who created the Cyrillic letter rule have mentioned that the Cyrillic Mongolian letter rule was based on the Traditional Mongolian script’s rule. The Cyrillic Mongolian alphabet that we use now consists of66articles. But the Traditional Mongolian script which has been inherited from thousand years consists of only3rules:vowel harmony (conformity), syllable closing consonants rule, and combining vowels. Mongolian is agglutinative language and rule for generating and inflecting word is based on approaches like attaching suffix and affix to word stem. But we follow different rules in both Cyrillic Mongolian and Traditional Mongolian script in order to attach suffix and affix to word stem. It is not Mongolian feature, but it is feature of orthographic rule followed in that script.When Mongolian noun, adjective and pronoun lie in sentence, they are inflected by plural suffix, case and possessive suffix. But verb is inflected by voice, state, temporal ending suffix, possessive ending suffix, subordinating conjunctive suffix and determining suffix. Then we developed model of noun and verb inflection.Thus, we calculated suffix sequence possibility and formulated suffix combination rule.2. We needed to create certain database after carrying out mentioned-above researches. Thus, I created both Mongolian morphological and inflectional suffix’s databases that fulfilled requirements of feature of Mongolian language and my own research work. This database will be the base of our many tasks which we will be doing in computer linguistics. Using our database, we will initially complete Mongolian language, Mongolian script morphology and conversion system research. Saving the word stems and grammatically transformed units into entries would be deemed as the most simple and crude method. Therefore, we have defined the database unit will be "word stem". Main advantages are:Words saved in the database will not be fictionally high; Program speed will increase; Word grammatical form will be solved based on the grammar, so all the possible transformations can be included;Basic database can be consists of following3types of bases:Primitive database of primitive key, Cyrillic Mongolian and Traditional Mongolian head words and explanation (72210); Database of word class(53294); Inflectional database with their code that shows grammar inflection (48000);We created vocabulary of abbreviated word containing1100words and vocabulary of proper noun consisting of9135words.According to the research, there are86suffixes such as Instrumental, directive, dative-locative, plural and negative etcin Mongolian language. We created vocabulary of suffix consisting of Cyrillic Mongolian and Traditional Mongolian script’s form by numbering that suffix. Sequence of doubling suffix has accurate principle. Morphemes which participates in word structure has own accurate position and sequence and their margins are obvious. But there are some exceptions that break the rule of morpheme’s certain position and sequence. For two scripts, we created sequence database of suffixes that were estimated accurately.3. As a result of executing mentioned-above activities, I was able to decide goal of doing Mongolian morphological analysis using two-level morphology based on created database. We demonstrated modeling rule of Traditional Mongolian script and Cyrillic Mongolian in order to analyze in Mongolian morphology. In order to do this, we model Mongolian rule using finite-state automata and two-level morphology in Mongolian morphology. We conducted experiment on parsing word as structure and generating word through this model. We studied it deeply, turned it into practical usage and executed following activities.In the work process, it became obvious that two-level finite state morphology can be used in Mongolian morphology. It gave us opportunity to use these actions such as generating and parsing word in further research work. Two actions like parsing and generating word as inflectional affixes need to be based on finite state automata in computational morphology. Thus, it is important to describe design for automata that inflect word of database unit. Because we classified database into inflect and non-inflect word and inflected words were divided into noun and verb. Word grammar inflections suit noun and verb inflection.In order to process description in PC-KIMMO, all rules should be created true and to be checked consequently.In addition, we considered approaches related to creating rule in chapter. We modeled Mongolian rule and did morphological analysis. In order to do this, we modeled Cyrillic Mongolian and Traditional Mongolian script individually and created suitable rule files. We developed morphological analysis’software of Cyrillic Mongolian and Traditional Mongolian script using rule file and lexical file and then we tested successfully.For word automata, it has to parse inserted text of user, process, generate correct word by attaching appropriate affixes to stem and show result or text of word structure. When we do processing in Unicode text, we need to execute following additional works.As a result, Cyrillic Mongolian and Traditional Mongolian texts can be processed and first version of KIM_MON program was developed. Result of text processing is irrelative to character coding (Latin, Cyrillic, etc.) but directly depends on how it provides and classifies sufficient vocabulary file and how it defines the rules correctly.4. I conducted experiment on Mongolian morphological analysis using KIM_MON program and created database. Let us state about result of experiment in brief. When we parse morphology on text, correct conversion comprises97.6%. For attaching word action, it attached correctly mentioned-above word that was correctly and draws correct result. When we do conversion in accordance with developed algorithm, following results were appeared.a) While converting from Cyrillic Mongolian to Traditional Mongolian script, recognizing word sense is91.3%.b) While converting from Traditional Mongolian script to Cyrillic Mongolian, recognizing word sense is89.1%.While doing experiment related to recognizing word sense, recognizing word sense is86.9%. From the experiment process, creating massive training database can increase recognizing percent.
Keywords/Search Tags:Cyrillic Mongolian, Traditional Mongolian scripts, Finite State Automata, Computational Linguistics
PDF Full Text Request
Related items