| With the rapid development of science and technology,the research on Mongolian information processing has made great progress in the field of Natural Language Processing.But in Mongolian character coding,many Mongolian documents and websites do not use uniform coding form to store information.This is not only bad for information exchange,but also seriously hinders the development of Mongolian information processing technology.In 2000,the ISO and Unicode technical committees formulated and promulgated the standard coded Mongolian character set,which is not only conducive to the realization of information sharing but also to the standardization of Mongolian information processing.At present,most of the various forms of Mongolian word coding can be converted into minimal morpheme coding,and good achievements have been achieved in this field,and the progress of converting the minimum morpheme coding into standard coding is very little.How to convert the minimum morpheme encoding into standard encoding is a hot topic in scientific research.In order to realize the transformation of Mongolian minimum morpheme encoding to standard coding,the main work is made in the following aspects:(1)The dictionary is divided into the dictionary of the whole word and the dictionary of the word stem + affixes by the method of dictionary conversion.Then the dictionary of the whole word is divided into part of speech,and the stem + affix dictionary is divided into negative and positive parts.(2)First,the transformation of the minimum morpheme encoding to the standard code is realized by using the hidden Markov model.Second,the data smoothing algorithm is added to the hidden Markov model to solve the zero-probability problem.Finally,for the traditional hidden Markov model,the degree of association of the front encoding characters is only considered when encoding conversion,and the back characters are not considered.This makes the relevant encoding information lost.Hence,the front and back codes are added to the hidden Markov model,and the encoding and conversion of the second order hidden Markov model is realized.(3)By combining the above two methods,the correct rate of the Mongolian word coding conversion is further improved.Based on the corpus of 1.5 million more groups of experiment was carried out with these methods.The experimental results show that the method of dictionary combined with hidden Markov model is most suitable for minimum morpheme code to standard code conversion.Compared with existing methods,the method improved the coding conversion accuracy. |