Font Size: a A A

Research On Technologies Of Evaluation And Diagnosis Of Machine Translation

Posted on:2011-07-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:B WangFull Text:PDF
GTID:1118360332957973Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Human society is a complex multi-cultural composition of the overall. The interaction and infiltration among cultures promote the development of the society. In today's information era, the main carrier of the cultural exchanges is the language, and the translation among different languages is exactly the key to the cultural exchanges. Facing of the massive multi-language information, the traditional human translation has been far unable to meet the need, and therefore it has been a major hot spot to translate the natural languages automatically. And, machine translation technology has been main topic in the past 10 year's research in the field of artificial intelligence.In the studies of machine translation, the evaluation methods are a key technology which has very important significance for the machine translation research. Evaluation method can evaluate the performance of the machine translation system, point out the problems, and then guide the system development. An accurate evaluation method is the main basis for the system development process, and it is one of the main driving forces to promote the development of machine translation system. It can be said that if there is no effective evaluation method, there is no development of machine translation systems.After a few years'development, the study of automatic machine translation evaluation method in the world has achieved fruitful results. These results are mainly reflected in the translation and reference translation based evaluation. At present, the macroscopically similarity evaluation based method has made some progress, but its evaluation performance remains to be further improved. On the other hand, with the machine translation systems becoming more complex, the traditional macro-evaluation approach has been difficult to meet the current machine translation research needs. Researchers and developers need to be provided more extensive information by the automatic evaluation methods. Because of these problems, this paper aims to further improve the performance of automatic macro-evaluation methods, and introduce the micro-oriented evaluation of machine translation for effective machine translation automatic diagnosis. For the key technologies such as string similarity based macro-evaluation methods, the coverage of the reference, the monolingual and bilingual automated diagnostic methods, we'll conduct in-depth study.1. The Skip-Ngram based automatic marco-evaluation. With its rapid and stable performance and a wide range of applicability, the string similarity based evaluation methods are generally accepted to improve the performance of the string similarity based evaluation methods; this research proposes an automatic evaluation metric named SNR for the machine translations system. The metric extends the idea of skip-bigram with larger span and multiple statistics. SVM regression method is also introduced into the metric to tune the weights of statistics. The results of the research have achieved two first and a second good result in the authoritative evaluation in 2008, NIST.As an application of the macro-evaluation, we also propse the combination of the machine translation systems with macro-evaluation scores. System combination has been widely explored in the machine translation, especially with the emergence of the MT systems constructed in various architectures. This research describes an improved strategy to combine the outputs of machine translation on sentence-level which balances the stability and the effectiveness of the combination. During the calculation of the risk of each hypothesis in the N-best list, we weight the hypotheses with the performance of the MT system. The performance is measured by the state-of-the-arts automatic evaluation metrics on the development data. The results of this research won the best position in the 2008 and 2009 domestic evaluation.2. Syntax based extension of the reference. Because of the variations of the languages, the coverage of the references is very important to the reference based automatic evaluation of machine translation systems. This research proposes a method to extend the reference set of the automatic evaluation only based on multiple manual references and their syntactic structures. In our approach, the syntactic equivalents in the reference sentences are identified and hybridized to generate new references. The new method need no external knowledge and can obtain the equivalents of long sub-segments of reference sentences. With the extended set of reference, the performance of the macro-evaluation methods is further improved. 3. Automatic diagnostic evaluation of machine translation using monolingual linguistic categories and check-points. The micro-evaluation, i.e. diagnostic evaluation, is the newest topic of the evaluation of machine translation. It has attracted universal attention at home and abroad, but it is still in the initial stage. This research present an automatic diagnostic evaluation platform called MCDI which provides multi-factored evaluation based on linguistic categories and automatically constructed linguistic check-points. The instances of various categories and references are composed into test cases called linguistic check-points. We present a method that automatically extracts check-points from parallel sentences. By means of linguistic categories and check-points, our method can monitor an MT system in translating important linguistic phenomena to provide diagnostic evaluation. The results of this research have been accepted as one of the evaluation criteria in the domestic community.4. Automatic diagnosis with bilingual transformation analysis. The fundamental task of the machine translation is to transform the source language to the target language. The quality of the bilingual transformation is the fundamental issue of the machine translation. Based on the study of monolingual diagnosis method, this research proposes an automatic diagnostic evaluation strategy called BIDI for machine translation systems, which is capable of detecting the incorrect bilingual transformation (iBT). The research also describes a method to determine the causation of the iBTs for specific MT system and a special evaluation of the order errors.In summary, the main contributions of this thesis is to improve the macro-evaluation methods with the adoption of the new similarity evaluation method, machine learning methods and reference extension, and provides new ideas for the key technologies in macro-evaluation methods. At the field of micro-evaluation, i.e. the automatic diagnosis technology has made pioneering work, from the perspective of monolingual and bilingual point of view forward a complete automatic diagnosis system for machine translation researchers respectively. The new diagnostic methods on the one hand promoted the research and development of machine translation, the other hand, provide a reference for the micro-evaluation studies in the future.
Keywords/Search Tags:Machine translation, automatic evaluation, automatic diagnosis, system combination
PDF Full Text Request
Related items