Mongolian Language Model Based On Recurrent Neural Network

Posted on:2018-11-10

Degree:Master

Type:Thesis

Country:China

Candidate:X F Yan

Full Text:PDF

GTID:2348330515452355

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Language model is an important part in natural language processing.Currently,N-gram language model is the most widely used.With the development of deep learning,deep neural network has been gradually applied to automatic speech recognition in recent years.It brings many new research topics.Neural network based language model is an important research direction in deep learning.Mongolian language model plays an important role in Mongolian information processing technology,such as Mongolian automatic speech recognition,Mongolian information retrieval and Mongolian machine translation.At present,neural network based language model has been widely used in English and Chinese.But,neural network based language model is rarely used in Mongolian.This dissertation mainly concentrates on how the neural network based language model is applied in Mongolian language.Mongolian language has widely influences in all over the world.However,there is a phenomenon that many words have the same presentation form but their encodings are different.It results in statistic and retrieval very difficult on such a Mongolian corpus.This dissertation focuses on solving the problem of statistic and retrieval under the above-mentioned circumstance.In this way,the performance of the Mongolian language model can be improved.At first,this dissertation proposed a method to merge the words with same presentation forms by a kind of intermediate characters.Then,we build N-gram language models based on Latin characters and intermediate characters,respectively.Moreover,fast recurrent neural network language models(FRNNLM)are also constructed based on Latin characters and intermediate characters,separately.Next,a scheme for combining N-gram language model with FRNNLM is realized.Finally,the performance of the proposed language models is evaluated using perplexity,and we apply the proposed language models in Mongolian automatic speech recognition to compare the word error rate(WER).Experimental result shows that the language model based on intermediate characters can be reduce the number of vocabularies by 41%than the language model based on Latin characters.Meanwhile,the language model based on intermediate characters(such as 3-gram and FRNNLM)can also reduce the perplexity by 40%than the corresponding language model based on Latin characters.In aspect of Mongolian automatic speech recognition,the language models based on intermediate characters(i.e.3-gram,FRNNLM and 3-gram+FRNNLM)can reduce the WER by 20%than the language models based on Latin characters.Therein,the scheme of combining 3-gram with FRNNLM is the best one.Therefore,we can conclude that the proposed method effectively improve the accuracy of Mongolian automatic speech recognition.

Keywords/Search Tags:

Language Model, Middle Character, Automatic Speech Recognition, N-gram, FRNNLM

PDF Full Text Request

Related items

1	Researching And Building Of The Mongolian Large Vocabulary Independent Continuous Speech Recognition System
2	Application Research On Statistical Language Model Of Large Vocabulary Continuous Speech Recognition System
3	Research On Statistical Language Model Of Large-Vocobulary Continuous Speech Recognition System
4	Researching And Building Of The Mongolian Continuous Speech Recognition System Based On HMM
5	Parallel Optimization Method In Language Model For Mandarin Speech Recognition
6	Automatic dialect classification: Advances for read and spontaneous speech, and printed text
7	Research Of Continuous Chinese Sign Language Recognition Based On N-gram And Syntactic Models
8	Research On Continuous Speech Recognition Technology In Noisy Environment
9	Researching Of The Mogolian Language Model Based On Speech Recognition
10	Design And Implementation Of Speech Recognition System Based On DNN-LSTM