Font Size: a A A

Video-based Sign Language Recognition With Deep Learning

Posted on:2021-04-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:J F PuFull Text:PDF
GTID:1368330602494247Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Sign language(SL)is one of the most significant communication ways in deaf-mute society.For better communication with no barrier between hearing people and the deaf-mute society,automatically sign language recognition(SLR)technique ap-pears.Sign language recognition is a representative interdisciplinary task,which targets on automatically translating sign videos into natural language for easy understanding.The research area of SLR involves computer vision,natural language processing,mul-timedia analysis,etc.In recent years,although deep learning has achieved huge success in the task of sign language recognition,challenges and difficulties remain.First,sign language is represented by the appearance and motion of both hands.Hence,how to extract discriminative feature representations for hand appearance and motion trajec-tory urgently needs to be solved.Second,there is no accurate alignment label between sign video and text annotation due to the expensive labelling cost.Thus it is difficult to optimize the deep learning-based SLR network in a traditional end-to-end way.To address these issues,this thesis proposes a series of techniques based on deep learning.The main work and innovations are listed as follows:(1)We propose a multimodal feature extraction method for hand appearance and motion trajectory representation to recognize isolated sign language.The proposed ar-chitecture consists of two branches for hand representation and trajectory representation,respectively.The feature for hand appearance is extracted with a 3D convolutional neu-ral network.For trajectories,we use shape context to describe each joint and combine them all within a dense feature matrix.After that,a convolutional neural network is applied to generate a robust representation.Both two kinds of features are combined and classified with support vector machine algorithm.(2)We propose a new sign language recognition network based on 3D residual net-work.It uses dilated convolutions for sequential modeling in continuous sign language system,which effectively accelerate the calculation in the inference phase.Besides,the proposed method alleviates the long-term dependency problem caused by recurrent neural networks.An iterative optimization strategy is adopted to improve the represen-tation capacity of the visual feature extractor.(3)We propose an alignment network with iterative optimization for continuous sign language recognition.Two different kinds of decoders,i.e.CTC decoder and LSTM decoder,are embedded into a unified network.Both decoders are jointly opti-mized by maximum likelihood criterion with a soft Dynamic Time Warping(soft-DTW)alignment constraint.Based on the warping path,we proposed an iterative optimization algorithm for better performance.In the inference stage,CTC decoder obtains a set of completed hypotheses sentences as candidates using beam search.We re-rank the can-didates using both CTC and LSTM decoders,and the hypothesis with maximum joint probability is selected as the final result.(4)We propose data augmentation learning-based continuous sign language recog-nition algorithms.On one hand,we augment the sign language dataset by editing the sign video and its corresponding text label following the calculation of WER with three different operations,i.e.substitution,deletion,and insertion,respectively.The gener-ated data is used to explore the relationship between raw data modals via cross-modality learning.On the other hand,from the aspect of multilingual settings,we propose a unified framework for multilingual sign language recognition with joint training.The dataset is greatly expanded while recognizing multiple sign languages.The proposed method outperforms the models trained with the individually single-lingual setting.
Keywords/Search Tags:Sign language recognition, Representation learning, Sequential modeling, 3D convolutional neural networks, Recurrent neural networks, Iterative refinement, Connectionist temporal classification, Multilingual recognition
PDF Full Text Request
Related items