Font Size: a A A

Research On Tibetan Language Model Based On Recurrent Neural Network

Posted on:2019-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:T T ShenFull Text:PDF
GTID:2428330593451021Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence,recurrent neural network language model(RNNLM)has prevailed in a range of speech and natural language processing related tasks in recent years,such as machine translation,information retrieval and speech recognition.The advanced RNNLM whose performance has exceeded the traditional N-gram language model has been the state-of-the-art.But for Tibetan,a large quantities of data are required for language modelling with good performance,which poses the difficulties of modeling for this low-resource language.This paper addresses this issue mainly using model training skills and Tibetan features.So we proposed interpolation language model,domain adaptation recurrent neural network language model and recurrent neural network language model using Tibetan radical.In order to verify the effectiveness of the proposed method,the perplexity is used to evaluate the language model.Besides,we build speech recognition system and use character error rate(CER)as another evaluation criteria.Experiments are conducted with two data sets with different sizes,small Tibetan data set(STD)and large Tibetan data set(LTD)respectively.The results show that the proposed LMs show better performance.Compared to Knerser-Ney 3-gram,interpolation language model gains 16.1% relative PPL reduction and 6.3% relative reduction;and domain adaptation recurrent neural network language model gains 34.2% relative PPL reduction.Compared to the standard recurrent neural network,the PPL of language model using Tibetan radical reduced by 13.5 percent.So this research solved the low-resource issue and improve the performance of Tibetan language model.
Keywords/Search Tags:Tibetan, Language model, N-gram, RNNLM
PDF Full Text Request
Related items