Study On Key Techniques Of Sign Language Recognition Based On Deep Learning

Posted on:2020-10-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y W Li

Full Text:PDF

GTID:2428330596477310

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Sign language recognition is to translate obscure and difficult sign language into regular words and voices by means of pattern recognition and other related technologies,thus providing a bridge for free communication between deaf and mute people and normal people.In addition,as an indispensable way of human-computer interaction,the study of sign language gesture recognition is of great significance in this intelligent era.There are some key technologies and difficulties in sign language recognition,such as data acquisition and pre-processing,high discriminatory sign language feature extraction,excessive redundant information in sign language data(transitional frame,static frame and excessive spatial background),and sequential segmentation in continuous sign language recognition.Aiming at these key technologies and difficulties,this paper introduces the related methods of deep learning into the research of sign language recognition,and focuses on the key technologies of sign language recognition based on deep learning.The main research work and achievements include:1.A Kinect-based sign language data acquisition system is designed and built.The system can realize the synchronous acquisition of color image,depth image and coordinate data of human skeleton points conveniently,efficiently and reliably.Based on the acquisition system,this paper constructs a data set of Chinese daily sign language,which contains 60 daily sign language words and 30 consecutive sign language sentences composed of these words.There are 66600 sign language samples,each sample contains three types of data: color video,depth video and skeletal point coordinates.2.A sign language recognition algorithm based on multi-modal long-term and short-term space-time feature fusion is proposed.In order to extract more discriminative sign language features while avoiding the tedious steps of hand detection,segmentation and manual design,the sign language data is divided into several segments,and the short-term temporal and spatial features of each segment in color video and depth video are extracted by three-dimensional convolution neural network.At the same time,the powerful feature representation of hand trajectory segments is obtained by combining shape context and LeNet.Then the three types of features of the same segment are fused together and input into the LSTM network for time series modeling,which further fuses features to obtain multi-modal long-term spatial and temporal features with high discrimination.Finally,the features are mapped to the sample classification space and classified by SoftMax classifier.3.Aiming at the characteristics of sign language expression and redundancy in sign language data,we introduce the attention mechanism based on the above sign language recognition framework,then propose a sign language recognition algorithm based on modal attention mechanism and a kind based on the sign language recognition algorithm of the time and space attention mechanism.the attention mechanism can make the sign language recognition model focus on the important information in the sign language data,thereby improving the accuracy of the sign language recognition model.4.Aiming at the difficulty of time sequence segmentation in continuous sign language recognition,a continuous sign language recognition algorithm based on Attention-CTC fusion model is proposed.Connectionist temporal classification(CTC)is introduced into the sign language recognition model based on spatial attention mechanism to realize automatic alignment between input sign language data sequence and tag sequence,thus avoiding time sequence segmentation.Moreover,we add the attention mechanism on the basis of CTC,which eliminates the conditional independence hypothesis of CTC itself to a certain extent,and improves the accuracy of continuous sign language recognition.

Keywords/Search Tags:

sign language recognition, deep learning, multimodal fusion, attention mechanism, connectionist temporal classification

PDF Full Text Request

Related items

1	Chinese Sign Language Recognition For Large Vocabulary
2	Video-based Sign Language Recognition With Deep Learning
3	Non-specific Human Sign Language Recognition Based On Deep Learning
4	Research On Continuous Sign Language Translation Based On Temporal Neural Networks
5	Study On Attention Based Speech Emotion Recognition
6	Research On CTC-based And Attention-based End-to-end Speech Recognition
7	Research On End-to-End Speech Recognition Method Based On Self-Attention Mechanism
8	Multimodal Processing Technology For Video Analysis
9	Research On Dataset Building And Recognition Of Chinese Sign Language Light Field
10	Research On Speech Emotion Recognition Algorithm Based On Deep Learning