Font Size: a A A

Study On Key Techniques Of Sign Language Recognition Based On Deep Learning

Posted on:2020-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y W LiFull Text:PDF
GTID:2428330596477310Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Sign language recognition is to translate obscure and difficult sign language into regular words and voices by means of pattern recognition and other related technologies,thus providing a bridge for free communication between deaf and mute people and normal people.In addition,as an indispensable way of human-computer interaction,the study of sign language gesture recognition is of great significance in this intelligent era.There are some key technologies and difficulties in sign language recognition,such as data acquisition and pre-processing,high discriminatory sign language feature extraction,excessive redundant information in sign language data(transitional frame,static frame and excessive spatial background),and sequential segmentation in continuous sign language recognition.Aiming at these key technologies and difficulties,this paper introduces the related methods of deep learning into the research of sign language recognition,and focuses on the key technologies of sign language recognition based on deep learning.The main research work and achievements include:1.A Kinect-based sign language data acquisition system is designed and built.The system can realize the synchronous acquisition of color image,depth image and coordinate data of human skeleton points conveniently,efficiently and reliably.Based on the acquisition system,this paper constructs a data set of Chinese daily sign language,which contains 60 daily sign language words and 30 consecutive sign language sentences composed of these words.There are 66600 sign language samples,each sample contains three types of data: color video,depth video and skeletal point coordinates.2.A sign language recognition algorithm based on multi-modal long-term and short-term space-time feature fusion is proposed.In order to extract more discriminative sign language features while avoiding the tedious steps of hand detection,segmentation and manual design,the sign language data is divided into several segments,and the short-term temporal and spatial features of each segment in color video and depth video are extracted by three-dimensional convolution neural network.At the same time,the powerful feature representation of hand trajectory segments is obtained by combining shape context and LeNet.Then the three types of features of the same segment are fused together and input into the LSTM network for time series modeling,which further fuses features to obtain multi-modal long-term spatial and temporal features with high discrimination.Finally,the features are mapped to the sample classification space and classified by SoftMax classifier.3.Aiming at the characteristics of sign language expression and redundancy in sign language data,we introduce the attention mechanism based on the above sign language recognition framework,then propose a sign language recognition algorithm based on modal attention mechanism and a kind based on the sign language recognition algorithm of the time and space attention mechanism.the attention mechanism can make the sign language recognition model focus on the important information in the sign language data,thereby improving the accuracy of the sign language recognition model.4.Aiming at the difficulty of time sequence segmentation in continuous sign language recognition,a continuous sign language recognition algorithm based on Attention-CTC fusion model is proposed.Connectionist temporal classification(CTC)is introduced into the sign language recognition model based on spatial attention mechanism to realize automatic alignment between input sign language data sequence and tag sequence,thus avoiding time sequence segmentation.Moreover,we add the attention mechanism on the basis of CTC,which eliminates the conditional independence hypothesis of CTC itself to a certain extent,and improves the accuracy of continuous sign language recognition.
Keywords/Search Tags:sign language recognition, deep learning, multimodal fusion, attention mechanism, connectionist temporal classification
PDF Full Text Request
Related items