Font Size: a A A

Named entity translation: A statistical approach

Posted on:2003-04-19Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Al-Onaizan, Yaser MFull Text:PDF
GTID:1468390011986959Subject:Computer Science
Abstract/Summary:
New words and phrases are being introduced in news stories on a daily basis in the form of personal names, organizations, locations, temporal phrases, etc. Such phrases are referred to as named-entity phrases in the literature. While named-entity phrases can often be identified automatically in running text, they are some of the most difficult phrases to translate, because new phrases can appear from nowhere, and because many are domain specific, not to be found in bilingual dictionaries. This is evident by the fact that a state-of-the-art commercial system translates named entities incorrectly 50% of the time. For example, commercial systems often produce translations such as Koln Baol (instead of Colin Powell) or O'Neill's urine (instead of Paul O'Neill).; We present a novel, statistical translation algorithm that combines easy-to-obtain bilingual and monolingual resources. We successfully applied this algorithm to translating Arabic named-entity phrases to English. The algorithm does not require any hard-to-obtain linguistic resources and should be portable to other language pairs fairly easily.; We report on the translation accuracy on a development and blind test sets. Our translation algorithm outperforms commercial systems and rivals and in some cases (e.g., on person names) outperforms human translators performance. We present a comparison of the translation accuracy based on two different evaluation metrics (exact-matching and human-subjective evaluation) of our system, human translators, a state-of-the-art commercial system, and a research-based Statistical Machine Translation system. A full analysis of translation errors is also presented.; Because named-entity phrases are very frequent in newspaper text, a system that is able to identify and translate them is an important tool for many Natural Language Processing (NLP) applications such as Machine Translation (MT), Information Retrieval (IR), Question Answering (QA), Message Understanding (MU), and Summarization.; We conclude this dissertation with an emphasis on the main contributions of this work and some insights on future work.
Keywords/Search Tags:Translation, Phrases, Statistical
Related items