Font Size: a A A

Analysis And Application Research On Bilingual Maximal-length Noun Phrase

Posted on:2016-03-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y G LiFull Text:PDF
GTID:1108330503953425Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
This paper mainly discusses the identity and alignment of maximal-length noun phrase and its application in statistical machine translation(SMT). Machine translation(MT) is a language problem in essence. The translation must depend on application of the knowledge of linguistics. It has important theoretical significance and application value to promote the development of statistical machine translation by studying the integration of meaningful linguistic knowledge in syntactic level effectively. After inspect structural characteristics of Chinese maximal-length noun phrases and English maximal-length noun phrases with rich syntactic and semantic information and start from the practical application of statistical machine translation, the paper commits to the identity and alignment of bilingual maximal-length noun phrase,the extension of maximal-length noun phrases. Meanwhile, the possibility of integrating bilingual maximal-length noun phrases into statistical translation model is discussed. The major contents in this paper include the following four parts:⑴Propose a fusion algorithm of bidirectional labeling in identity of Chinese maximal-length noun phrases with hybrid features.For the task of Chinese maximal-length noun phrases, on the basis of analyzing the existing methods and from the particularity of Chinese characteristics and the characteristics of sequence labeling algorithm based on support vector machine(SVM), the adaptability of fusion algorithm of hybrid features is studied. That it is effective to identify Chinese maximal-length noun phrases by using hybrid unit with words and base chunk by theoretical analysis and experiment. Also, bidirectional labeling results of Chinese maximal-length noun phrase has complementary properties in both directions. Fusion algorithm of bidirectional labeling based on “boundary fork” which can discover the complementary of two directions identification and achieve high fusion accuracy. We gain the F-1 value of 88.24%, which is 2.34% higher than the baseline.(2)Propose and design an integration algorithm of identity and alignment of Chinese-English maximal-length noun phrase.On the basis of structure analysis of Chinese-English maximal-length noun phrases, an integration algorithm of identity and alignment of Chinese-English maximal-length noun phrases is proposed. Proposed algorithm can discover the complementary of the identification of bilingual maximal-length noun phrases. At the same time, a model integrated the identity and alignment of bilingual maximal-length noun phrases is established. Thus a win-win situation in identity and alignment of bilingual maximal-length noun phrases is implemented. The experiment result shows proposed algorithm improves identity and alignment of bilingual maximal-length noun phrases obviously. We gain the F-1 value of 81.91%, which is more than 10% higher than the parsing-based method.(3)Propose the training algorithm of bilingual maximal-length noun phrases based on bilingual co-training.For promoting the recognition performance and domain adaptability of bilingual maximal-length noun phrases, a bilingual co-training algorithm is proposed and a detailed introduction about the choices of incremental tag is given. Bilingual co-training is different from normal co-training. Bilingual co-training algorithm looks at different sentences from Chinese and English as two views of one data set. At the training, the algorithm integrate Chinese–English linguistic features and complementary of Chinese maximal-length noun phrases and English maximal-length noun phrases. The involvement of cross-domain data enhances domain adaptability of identity algorithm; doing like this is also meaningful to large scale cross-domain language processing task in statistical machine translation. The experiment shows this algorithm improves identity of bilingual maximal-length noun phrases and its domain adaptability as well. As a result, we outperform the F-1 value by 4.52%.(4)Propose and implement statistical machine translation model integrated bilingual maximal-length noun phrases.Three kinds of strategies of statistical machine translation model from easiness to complication are proposed, thus translation can be improved gradually. In these strategies, Method-III applies the strategy “divide and conquer” and integrate maximal-length noun phrase into statistical machine translation in the way of “hard constraint”. Also, at the level of maximal-length noun phrase, these strategies integrate advantages of phrase-based statistical machine translation and hierarchical phrase-based statistical machine translation, so output of the translation system can be improved obviously and achieve up to 3.03 BLEU point over a competitive baseline on the long and complicated sentences.The work described in this thesis focuses on the identity of Chinese maximal-length noun phrase, the identity and alignment of bilingual maximal-length noun phrase and domain adaptability of bilingual maximal-length noun phrases. All of the work increased the efficiency and accuracy and domain adaptability of bilingual maximal-length noun phrases remarkably. A statistical machine translation model integrated bilingual maximal-length noun phrases ware established, which improved the output of the translation system obviously.
Keywords/Search Tags:maximal-length noun phrase, bilingual maximal-length noun phrase, sequence labeling, integration model, bilingual co-training, statistical machine translation
PDF Full Text Request
Related items