Font Size: a A A

Research On Chinese-uyghur Phrase Extraction In Phrase-based Translation Model

Posted on:2011-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:G J RenFull Text:PDF
GTID:2178360305987268Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bilingual phrases pairs extraction is a key step that training phrase translation model in the phrase-based statistical machine translation, however, due to the limited size of bilingual parallel corpora, the sparse data problem is very serious. At present the research of Chinese-Uyghur statistics machine translation realm is still placed in a beginning stage, so it is necessary to carry on a research on the sample of Chinese-Uyghur phrases.This text took advantage of the open source tool and phrase alignment tool GIZA++ in SilkRoad,training the parallel language anticipate database of Chinese-Uyghur and completed the whole translation model training, and improved calculation way of sample phrases and finally got a full rate form for phrases translation . All this is prepared for building a Chinese-Uyghur statistical machine translation system which is based on the phrases. Improved algorithm of phrases extraction was proposed, firstly this algorithm considers a Chinese word to multi-Uyghur words (including nonconsecutive), it also uses Och's method. If it meets the condition, this algorithm will extracts phrases ,in the end we extracts phrases considering SOV sentence structure in the Uyghur.Experiments show that this algorithm can extract more Chinese-Uyghur phrase pairs,so it is effective in phrase translation extraction.
Keywords/Search Tags:Translation Model, Phrase Extraction, Chinese-Uyghur phrase pairs
PDF Full Text Request
Related items