Font Size: a A A

Continuous Sign Language Recognition Via Reinforcement Learning

Posted on:2020-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhangFull Text:PDF
GTID:2428330578483122Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Millions of hearing-impaired people around the world routinely use some variants of sign languages to communicate.Sign languages convey information through those elements of gestures,hand motions or even facial expressions.There exist information communication obstacles between the deaf-mute and the hearing people or even the other deaf-mute,which makes the automatic translation of sign language meaningful and important.Continuous sign language recognition(CSLR)aims at translating sign videos into text sentences.It requires a fine-grained understanding of gestures,hand motions and facial expression in a video.Meanwhile,there exist semantic gaps between videos and sentences,as well as the difficulty of frame or word level alignment.To solve these challenges,we propose our CSLR model.First,we adopt a combined 3D residual convolutional neural network(3D-ResNet)to extract visual features from sign videos.Second,we adopt the Transformer to bridge the semantic gap between sign videos and target sentences.However,there are intrinsie defects of supervised learning(SL)to train the Transformer.The Transformer is typically trained to maximize the likelihood of the next ground-truth word given the previous ground-truth word,while the model uses the previously generated words to predict the next word at test-time.Besides,there exist a deviation between optimization objectives during training and the non-differentiable evaluation metrics during testing.To avoid these intrinsic defects,we employ reinforcement learning(RL)to train our CSLR model.In this thesis,we propose three novel techniques for CSLR:·We propose a novel framework based on 3D.ResNet and the Transformer for continuous sign language recognition(CSLR).To the best of our knowledge,we are the first to deploy the Transformer for sequence learning in CSLR.·We propose a“Self-critic,policy gradient algorithm for CSLR.Our CSLR model based on 3D?ResNet and the Transformer is formulated as a RL problem.Then,we employ a policy gradient algorithm to train the policy network.Moreover,in order to reduce variance while training,we introduce two policy gradient algo-rithms with different baselines,which are calculated through different generated sentences.·We propose an "Actor-critic" policy gradient algorithm for CSLR.We employ a value network to approximate the state value function so as to improve the accu-racy of the baseline in the policy gradient algorithm.As a result,we introduce in-dependent,altermate,hybrid optimization strategies for policy network and value network in turn,in which hybrid optimization strategy means that policy network and value network form an "actor-critic" architecture.We conduct our experiments on the RWTH-PHOENIX-Weather dataset.Exper-iments demonstrate the effectiveness of our CSLR model and optimization strategies based on RL.
Keywords/Search Tags:Sign Language Recognition, 3D-ResNet, the Transformer, Reinforcement Learning, Policy Gradient
PDF Full Text Request
Related items