| With the rapid development of Internet technology,cross-border e-commerce has become an important channel for international trade,resulting in an increasing demand for machine translation of commodity information.Since terminologies in the commodity information sentence conveys important information,which is very important for the comprehensive communication of commodity information.The current mainstream practice in academia and industry is to build a customized machine translation system in the e-commerce domain based on a bilingual e-commercial terminology dictionary.However,the construction of bilingual terminology in e-commerce domain completely relies on human translation,which has the problems of high cost and low efficiency.To solve this problem,this thesis proposes two classes of methods to automatically construct bilingual terminologies:extraction and generation.At the same time,the constructed bilingual terminologies are applied to the customized machine translation system in e-commerce to improve the translation quality of commodity information.(1)Bilingual E-Commercial Terminology Extraction Based on Cross-lingual Pre-trainingThis thesis proposes an extractive task to construct bilingual terminologies in e-commerce:given a terminology in source language and a sentence in target language,the model automatically determines and extracts the corresponding terminology in target language.To tackle the task,this thesis combines the cross-lingual pre-training incorporating terminology information in e-commerce,and makes full use of the deep semantic relationship between the source terminology and the target sentence to judge and extract the target terminology,forming bilingual terminology.Meanwhile,this thesis constructs a bilingual terminology extraction data set containing multiple commodity categories for Chinese-English and EnglishFrench in e-commerce domain.Experimental results show that the extraction method proposed in this thesis performs significantly better than various strong baseline methods.(2)Bilingual E-Commercial Terminology Generation Based on Domain Information FusionThis thesis proposes a generative task to construct bilingual terminologies in e-commerce:given a source terminology,the translation model directly generates its corresponding target terminology translation.To tackle this task,this thesis defines it for the first time,and divides it into supervised and unsupervised terminology translation tasks in e-commerce domain,simulating the language pairs that have abundant and scarce bilingual terminology resources,respectively.And the corresponding data sets are constructed.On this basis,this thesis proposes a bilingual e-commercial terminology translation method based on the fusion of domain information,which fully integrates the domain information contained in the parallel corpus in the news domain and the pseudo-parallel corpus in the e-commerce domain,and improves the model’s performance through iterative back-translation.Experimental results show that the proposed terminology translation method significantly outperforms various baselines on both supervised and unsupervised terminology translation tasks.(3)Customized Machine Translation for E-Commerce with Embedded Bilingual TerminologiesThis thesis applies the constructed bilingual terminologies in e-commerce domain to the e-commerce commercial customized machine translation system to improve the translation quality of commodity information,so as to verify the practical significance of the automatic construction method of bilingual terminologies proposed in this thesis.To this end,this thesis constructs a test set for the translation of commodity information sentences for multiple language pairs in e-commerce domain,and implements training data enhancement through code-switching.Combined with pointer network and shared word embedding representations,bilingual terminology information is integrated into e-commercial customized machine translation system.It optimizes the decoding process of the translation model,thereby improving the overall translation quality of product information sentences and the translation quality of the terminologies contained in them.Experimental results show that by incorporating the bilingual terminologies constructed in this thesis,the overall translation performance of the e-commercial customized machine translation system for product information sentences has been greatly improved,and the translation quality of the terminologies contained in the product information sentences has been further improved.It demonstrates the practical significance of the automatic construction method of e-commercial bilingual terminologies proposed in this thesis. |