Font Size: a A A

Realization Of Design And Evaluation Of System For Speech Translation Lexicon

Posted on:2005-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:J ChengFull Text:PDF
GTID:2168360125471024Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
It is a dream that people can clean off the obstacle gradually and eventually, communication with each other who has different language and background of the culture, and free from the limitation of the language. Recently, the dream comes into true with the rising of speech translation technology internationally. The research emphasis in every country is not only the direct speech translation among different languages by the computer but also the assistant of communication among people with different backgrounds of the language.The speech translation integrates the recognization of the speech, the machine translation with the synthesis of the speech, which is different from the general text translation and more challenging. Bilingual translation lexicon consisted of hundred thousands bilingual machine translation units is necessary for the speech translation system to cover the true language texts reasonably. So we must design the algorithm to extract the large-scale bilingual translation lexicon from the bilingual texts automatically.Our task is to acquire the language knowledge and the translation rule automatically or half-automatically from the language corpus by the machine learning in order to realize the machine translation and this problem is the most important constitution of machine translation in the speech translation system and has become the new breakthrough of the machine translation. In our research, we construct a large-scale translation lexicon prototype which applies to the speech translation system, propose a new algorithm in a creative way, improve on some defects and limitations and inherit the merits and advantages on the basis of the former research.We research on the automatic extraction of translation lexicon for single-source-word to single-target-word, single-source-word to target-multiword-unit and source-multiword-unit to target-multiword-unit in turn in which we use the information of the paraphrase dictionary, etyma, co-occurrence probability and cooperation difference of the context, etc. Simultaneously, we not only combine the approach of the threshold filter with theassociation value but also combine the feature of Chinese with the spoken language. We increase the correctness ratio of the high-level translation lexicn by using multi-association parameters and the mutuality exchange of the source-units to target-units for the classification of the lexicon. In addition, we propose an improved algorithm which can cover the single-units and multi-units synchronously and make the translation lexicon apply to the substitution of the translation units directly for the demand of the super large-scale translation lexicon construction.In the end, we evaluate the performance of the three universal and one improved algorithms for the translation lexicon and offer the result of the experiment and the analysis of the outcome. Moreover, we show some methods for the problems which need to continue researching.
Keywords/Search Tags:Speech Translation System, Machine Translation, Spoken Language Translation, Translation Lexicon, Language Corpus
PDF Full Text Request
Related items