Font Size: a A A

Semi-Supervised Discriminative English-Chinese Word Alignment

Posted on:2008-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:S J LiuFull Text:PDF
GTID:2155360245497865Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Word Alignment is a process of finding a translation relation of words between source sentence and target sentence. Being the basis of other translation relations, word alignment is the basic technology of other cross-language processing. So researchers have always being paid attention to it.On the basis of summarizing the main ways of word alignment and analyzing the newest development of it, this thesis introduces the semi- supervised discriminative word alignment method to improve the performance of word alignment between English sentence and Chinese sentence. Also, this thesis explored the multi-character tactics consisted by statistic features and experience features. At last, this technique is applied in the bilingual sentences retrieval system"OASIS"to solve the problem of redundant candidate words and low retrieval efficiency in similar systems. In detail, this thesis consists of the following contents:Firstly, this paper introduces the semi-supervised model of discriminative English and Chinese word alignment, especially on the EMD semi-supervised training algorithm. Using of the advantage of EM training and discriminative training, this algorithm can tune the model on both features and features'weights for improving the performances of the discriminative word alignment. This paper also introduces the N-Best decoding algorithm. It can keep more candidates for extension, so it can reduce the search errors and improve the word alignment performance.Secondly, this thesis hieratically introduces the features frequently used in the research of word alignment, and analyzes the classification of features. On the basis of analysis, this paper chooses the dictionary similarity and POS translation probability as the representative of pure experiential features and statistical- experiential features. We add the two features to the discriminate system based on pure statistical features, and analyses the effect. The result shows that adding the pure experience feature can bring more improvement in the system performance than the statistics-experience features.At last, on the basis of common discriminative model, this paper adds three new features, make the recall as the goal, and get the word alignment with recall 96%.With the application of word alignment in the word translation retrieval system, The word translation retrieval system"OASIS"is accomplished. It can obtain the relation of source word and candidate target word with high recall and efficiency. Practical application shows that the system can reduce noises greatly and bring up the efficiency of lexicographers.
Keywords/Search Tags:word alignment, discriminative model, MER, semi-supervised training, EMD
PDF Full Text Request
Related items