Font Size: a A A

Research On Large Language Models And Its Application In Machine Translation

Posted on:2010-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:R Y ZhangFull Text:PDF
GTID:2178360275994872Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a result of progress of natural language processing and appearance of large-scale corpus, large-scale statistical language modeling becomes both realistic and necessary. Language Models contribute to the selection of good translation in Machine Translation. Research results show that translation quality indicated by BLEU score improves steadily with increasing language model size. So large language model becomes an active area in recent years.In this paper, based on the analyses of milestone technologies in language modeling, we design and implement a set of large language model training tools and a group of interfaces to accessing language model trained.First of all, we design and implement language model training tools that are applicable to Google Web 1T corpus. We optimize the space cost and improve performance through using more compact data structure, easier smoothing algorithm and probability quantization technology. We can complete the task of language modeling and generate language model files with these tools.Next, we design and implement interfaces to accessing language model, which can be called by natural language applications such as machine translation to fetch probability of word sequences. To meet different needs, there are three ways to access an existed language model: through dynamic linked library, through communication with language model server or distributed language model server.Finally, we make use of large language model trained on Google Web 1T by our tools in machine translation application. The Bleu score improves from 20.54 to 21.96, 7% respectively, on the test set of Nist2008 international machine translation evaluation. Preliminary results show that large language model does improve translation quality.
Keywords/Search Tags:Large Language Model, Large-scale Corpus, Machine Translation
PDF Full Text Request
Related items