Font Size: a A A

Mongolian Language Model Based On Recurrent Neural Network

Posted on:2018-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:X F YanFull Text:PDF
GTID:2348330515452355Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Language model is an important part in natural language processing.Currently,N-gram language model is the most widely used.With the development of deep learning,deep neural network has been gradually applied to automatic speech recognition in recent years.It brings many new research topics.Neural network based language model is an important research direction in deep learning.Mongolian language model plays an important role in Mongolian information processing technology,such as Mongolian automatic speech recognition,Mongolian information retrieval and Mongolian machine translation.At present,neural network based language model has been widely used in English and Chinese.But,neural network based language model is rarely used in Mongolian.This dissertation mainly concentrates on how the neural network based language model is applied in Mongolian language.Mongolian language has widely influences in all over the world.However,there is a phenomenon that many words have the same presentation form but their encodings are different.It results in statistic and retrieval very difficult on such a Mongolian corpus.This dissertation focuses on solving the problem of statistic and retrieval under the above-mentioned circumstance.In this way,the performance of the Mongolian language model can be improved.At first,this dissertation proposed a method to merge the words with same presentation forms by a kind of intermediate characters.Then,we build N-gram language models based on Latin characters and intermediate characters,respectively.Moreover,fast recurrent neural network language models(FRNNLM)are also constructed based on Latin characters and intermediate characters,separately.Next,a scheme for combining N-gram language model with FRNNLM is realized.Finally,the performance of the proposed language models is evaluated using perplexity,and we apply the proposed language models in Mongolian automatic speech recognition to compare the word error rate(WER).Experimental result shows that the language model based on intermediate characters can be reduce the number of vocabularies by 41%than the language model based on Latin characters.Meanwhile,the language model based on intermediate characters(such as 3-gram and FRNNLM)can also reduce the perplexity by 40%than the corresponding language model based on Latin characters.In aspect of Mongolian automatic speech recognition,the language models based on intermediate characters(i.e.3-gram,FRNNLM and 3-gram+FRNNLM)can reduce the WER by 20%than the language models based on Latin characters.Therein,the scheme of combining 3-gram with FRNNLM is the best one.Therefore,we can conclude that the proposed method effectively improve the accuracy of Mongolian automatic speech recognition.
Keywords/Search Tags:Language Model, Middle Character, Automatic Speech Recognition, N-gram, FRNNLM
PDF Full Text Request
Related items