Font Size: a A A

Identification Of English Functional Noun Phrases For Machine Translation

Posted on:2013-06-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J MaFull Text:PDF
GTID:1228330395499239Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
English NP chunking plays an important role in machine translation. One major problem with machine translation lies in its ability to resolve the ambious problems caused by nouns. This paper, therefore, presents a study on the automatic identification of a kind of English functional noun phrases (NP), on the purpose of resolving structural ambiguity caused by noun phrases in English-Chinese machine translation (MT). Functional noun phrases refer to those noun phrases which are defined based on their syntactic functions in clauses. Structural ambiguity caused by noun phrases then can be solved by identifying their syntactic functions. The study includes the following three aspects:the analysis of the ambiguity problems in English-Chinese machine translation, the MT-oriented English Part-of-speech (POS) tagging, and the NP chunking. This NP chunking study is made on a self-built English-Chinese parallel corpus of in business domain which consists of200,000English words and270,000Chinese characters. The main research work can be summarized as follows:(1) The analysis of the structural ambiguity problems caused by noun phrases in English-Chinese machine translation. This paper makes a comparative analysis of the ambiguity resolution of two MT approaches:Rule Based Machine Translation (RBMT) and Statistical Machine Translation (SMT), by comparatively analyzing the Chinese translation work translated by SYSTRAN and GOOGLE translation systems, based on a manual machine translation evaluation method combining both faithfulness and smoothness. The results show that both lexical ambiguity and structural ambiguity have a lot do with NP chunking and understanding. And a surface structure N1+prep+N2is a typical structure which has caused ambiguity problem. Four main structural ambiguity problem caused by NPs are ambiguity caused by NPs when they make an inseparate part of the verb phrase, ambiguity caused by particles, ambiguity caused by "prep+noun" structures when they function as postmodifers, and ambiguity caused by "prep+noun" structures when they function as adjuncts.(2) The MT-oriented English POS tagging. This tagger is supposed to provide the following NP chunking task with POS tags, for the purposed of machine translation. After the result analysis of a pre-test, a new tagset is made for MT purpose, which is based on the Penn Treebank tag set, and the English sentences in the corpus are annotated on this new tagset. The statistical method combining the rules is applied in the study; the maximum entropy model is adopted, and rule-based approach is used in post-processing. Experiments show that our tagger achieves an accuracy of98.14%in open test, and85.65%correct on unknown words.(3) The NP chunking. Both the scope of NP chunks and the syntactic function types of NP chunks are identified in this task. The function tags of noun phrases are categorized, based on the systematic functional grammar. The conditional random fields model is adopted combining both the semantic information and language rules. The system performance is further compared with that of the model trained using Stanford POS tags in the open tests. Test results show that the system has achieved an F-score of89.04%in the open test using our gold standard tags, which also proves that our new tagset is a better approach for NP chunking, which has increased the F-score by2.21%, compared with the model using the Penn Tree bank POS tags.
Keywords/Search Tags:Machine Translation, English Functional Noun Phrase, StructuralAmbiguity, Part-of-speech Tagging, NP Chunking
PDF Full Text Request
Related items