Font Size: a A A

The enhancement of machine translation for low-density languages using Web-gathered parallel texts

Posted on:2008-12-02Degree:M.SType:Thesis
University:University of North TexasCandidate:Mohler, Michael Augustine GaylordFull Text:PDF
GTID:2445390005979132Subject:Computer Science
Abstract/Summary:
The majority of the world's languages are poorly represented in informational media like radio, television, newspapers, and the Internet. Translation into and out of these languages may offer a way for speakers of these languages to interact with the wider world, but current statistical machine translation models are only effective with a large corpus of parallel texts---texts in two languages that are translations of one another---which most languages lack.;This thesis describes the Babylon project which attempts to alleviate this shortage by supplementing existing parallel texts with texts gathered automatically from the Web---specifically targeting pages that contain text in a pair of languages. Results indicate that parallel texts gathered from the Web can be effectively used as a source of training data for machine translation and can significantly improve the translation quality for text in a similar domain. However, the small quantity of high-quality low-density language parallel texts on the Web remains a significant obstacle.
Keywords/Search Tags:Languages, Parallel texts, Translation
Related items