Font Size: a A A

Research On Neural Machine Translation With Fusion Of Visual Features

Posted on:2024-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:T Y ZhangFull Text:PDF
GTID:2568307076973529Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Machine translation task that uses different forms of input data(such as images,audio,video,etc.)to assist text is called multimodal machine translation which has the advantage of improving the accuracy and fluency for translation generation,and at the same time overcome the ambiguity problem existing in traditional text-only machine translation tasks.This paper mainly foucus on image-based multimodal machine translation tasks,but it is still in the exploratory stage that how to effectively use visual features extracted from images to assist text translation.On the other hand,simultaneous machine translation is one of the subtasks of machine translation for special application scenarios,has insufficient source input information.After analysis,we found that visual features can be used to alleviate various problems existing in simultaneous machine translation tasks.In this paper,adding visual features as the main theme,to carry out research on the above two points,the specific work is as follows:(1)Future information guided multimodal machine translation methods.This paper proposes a multimodal machine translation method based on multimodal consistency,which uses the semantic consistency between the source sentence,image and target sentence to construct a translation model;and a prediction mechanism that captures future context informations from visual features,The prediction is guided by the future context information contained in the image.Finally,through various experimental analysis,the positive effects of the model and prediction mechanism proposed in this paper is verified.(2)Simultaneous machine translation method that fuses visual features.This method uses image as an auxiliary modality to make up for the lack of source information in tasks.It uses the wait-k strategy and the idea of layered attention to build a translation model,and conducts experiments and analyzes on multiple sets of parallel corpora.Finally,it is proved that the proposed simultaneous machine translation method of fusion visual features can alleviate the problem of insufficient resources at the source end,and improve the translation quality of the model under the condition of ensuring low latency.
Keywords/Search Tags:Machine translation, Simultaneous machine translation, Multimodal, Visual features
PDF Full Text Request
Related items