Research On The Conversion Of Mongolian Minimum Morpheme Coding To Standard Coding Based On Dictionary And HMM

Posted on:2019-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y Xu

Full Text:PDF

GTID:2428330563456736

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of science and technology,the research on Mongolian information processing has made great progress in the field of Natural Language Processing.But in Mongolian character coding,many Mongolian documents and websites do not use uniform coding form to store information.This is not only bad for information exchange,but also seriously hinders the development of Mongolian information processing technology.In 2000,the ISO and Unicode technical committees formulated and promulgated the standard coded Mongolian character set,which is not only conducive to the realization of information sharing but also to the standardization of Mongolian information processing.At present,most of the various forms of Mongolian word coding can be converted into minimal morpheme coding,and good achievements have been achieved in this field,and the progress of converting the minimum morpheme coding into standard coding is very little.How to convert the minimum morpheme encoding into standard encoding is a hot topic in scientific research.In order to realize the transformation of Mongolian minimum morpheme encoding to standard coding,the main work is made in the following aspects:(1)The dictionary is divided into the dictionary of the whole word and the dictionary of the word stem + affixes by the method of dictionary conversion.Then the dictionary of the whole word is divided into part of speech,and the stem + affix dictionary is divided into negative and positive parts.(2)First,the transformation of the minimum morpheme encoding to the standard code is realized by using the hidden Markov model.Second,the data smoothing algorithm is added to the hidden Markov model to solve the zero-probability problem.Finally,for the traditional hidden Markov model,the degree of association of the front encoding characters is only considered when encoding conversion,and the back characters are not considered.This makes the relevant encoding information lost.Hence,the front and back codes are added to the hidden Markov model,and the encoding and conversion of the second order hidden Markov model is realized.(3)By combining the above two methods,the correct rate of the Mongolian word coding conversion is further improved.Based on the corpus of 1.5 million more groups of experiment was carried out with these methods.The experimental results show that the method of dictionary combined with hidden Markov model is most suitable for minimum morpheme code to standard code conversion.Compared with existing methods,the method improved the coding conversion accuracy.

Keywords/Search Tags:

minimum morpheme coding, standard coding, dictionary, hidden Markov model, smoothing algorithm

PDF Full Text Request

Related items

1	Research On Hidden Markov Model Based Distributed Arithmetic Coding
2	Protein-coding gene structure prediction using generalized hidden Markov models
3	Study On Chinese Named Entity Recognition Based On Hidden Markov Model
4	Joint Source-Channel (De)Coding/Modulation In Communication Systems
5	The Research On The Coding Algorithm In The Video Coding Standard Of H.264/AVC
6	Research On Key Techniques Of Internet-oriented Video Transmission
7	Research On Intrapredictionand Entropy Coding Technology Oriented The Next Generation Of Video Coding Standard
8	Minimum Dictionary Learning Based On Non-local Sparse Model
9	Research On Key Technologies Of Protograph LDPC Code-Based Distributed Joint Source-Channel Coding
10	Research On Multiresolution Hidden Markov Model For Image Denoising