| Bilingual phrases pairs extraction is a key step that training phrase translation model in the phrase-based statistical machine translation, however, due to the limited size of bilingual parallel corpora, the sparse data problem is very serious. At present the research of Chinese-Uyghur statistics machine translation realm is still placed in a beginning stage, so it is necessary to carry on a research on the sample of Chinese-Uyghur phrases.This text took advantage of the open source tool and phrase alignment tool GIZA++ in SilkRoad,training the parallel language anticipate database of Chinese-Uyghur and completed the whole translation model training, and improved calculation way of sample phrases and finally got a full rate form for phrases translation . All this is prepared for building a Chinese-Uyghur statistical machine translation system which is based on the phrases. Improved algorithm of phrases extraction was proposed, firstly this algorithm considers a Chinese word to multi-Uyghur words (including nonconsecutive), it also uses Och's method. If it meets the condition, this algorithm will extracts phrases ,in the end we extracts phrases considering SOV sentence structure in the Uyghur.Experiments show that this algorithm can extract more Chinese-Uyghur phrase pairs,so it is effective in phrase translation extraction. |