Neural Sign Language Translation Models Based On Cross-modal Information Fusion

Posted on:2022-06-21

Degree:Master

Type:Thesis

Country:China

Candidate:J B Zheng

Full Text:PDF

GTID:2568306326973529

Subject:Intelligent Science and Technology

Abstract/Summary:

PDF Full Text Request

Sign Language(SL),as a special visual natural language,relies on multi-channel information such as manual features and non-manual features to convey language information.In recent years,Sign Language Translation(SLT),as an important application for bridging communication gap between the deaf and the hearing,has attracted widespread academic attention.And SLT based on neural machine translation framework is a newly emerging research field with the development of artificial intelligence.We found that based on the existing research framework,it is difficult to deeply dig out the implicit language features of sign language as a special natural language in a weakly supervised manner.To this end,we put forward improved ideas from visual heuristics and semantic heuristics:From the perspective of semantic heuristics,we believe that it is beneficial to introduce additional word-level semantic knowledge of sign language linguistics to assist in improving sign language translation.However,this idea requires modeling to solve the problems such as sign language segmentation,multi-modal fusion and sequence alignment.Hence,we proposed a Knowledge-Based Multi-modal Features Fusion Encoder for Dynamic Graph Sign Language Translation Model.This is the first time that we have introduced the concept of graph neural network to neural sign language translation tasks.In the graph neural sign language translation model,we designed an embedding module of multi-modal graph for the first time,which is used to quantify sign language visual features and sign gloss features,so that the multi-modal encoder can fuse graph network and multi-modal features.From the perspective of visual heuristics,we found that there is a lot of redundant information in the input video sequence.This kind of redundant information generally exists in the temporal neighborhood in the form of similar frames,especially in longer sentences.The existence of redundant frames not only takes up space,consumes memory,and introduces a large amount of noise,but also increases the complexity of graph neural network.To this end,we introduced a Frame Stream Density Compression algorithm.The algorithm effectively reduces the redundant information of input frames and the number of invalid nodes of graph network in an unsupervised manner,and enhances the density of effective information flow.This method is also instructive for other low-resource data processing.We conducted experiments on a publicly available popular sign language translation dataset RWTH-PHOENIX-Weather 2014T to verify our proposed methods.Experiments show that our optimized models outperform the state-of-the-art baseline model.

Keywords/Search Tags:

Sign Language Translation, Multi-modal, Graph Neural Network, Sign Language Linguistics, Frame Stream Density Compression

PDF Full Text Request

Related items

1	Study On Sign Language Recognition Based On Sign Language Linguistics And Human Kinematics
2	Research On Deep Learning Based Sign Language Translation And Generation Technology
3	Research On Continuous Sign Language Translation Based On Temporal Neural Networks
4	Research On Key Technologies Of Chinese Sign Language Recognition And Translation Based On New Continuous Sign Language Dataset
5	Sign Language Translation Based On Spatial-temporal Graph Convolutional Network
6	Research On Virtual Human Sign Language Translation Technology Driven By Chinese Voice
7	Research On Compression Techniques Oriented To Sign Language Based News Broadcasting System
8	Research And Implementation Of Sign Language Recognition Based On Deep Learning
9	Research On Continuous Sign Language Recognition Methods Based On Graph Attention
10	Research On Sign Language Motion Region Segmentation And Sign Language Recognition Based On Neural Network