Font Size: a A A

Chinese Sign Language Recognition Based On Convolutional Network And Long Short Term Memory Network

Posted on:2019-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:C S MaoFull Text:PDF
GTID:2428330542997951Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Sign language recognition(SLR)aim to translate acquired sign language to text or speech by Human Computer Interaction(HCI)technology.It make it convenient for hearing impaired people to communicate,and also provide an opportunity to receive education for children with congenital deafness.Thus,researching on sign language recognition and constructing a complete applicable recognition can guarantee the study and daily life of the impaired,which have a significant influence on the society.Besides,smart-life has dominates our daily life,therefore it will bring us a lot of comfortable experience by studying SLR based on computer vision.SLR is a kind of temporal task,so the temporal modeling is the key element for recognition performance.Recently,deep learning has achieved great development and breakthrough in computer vision,it shows that Convolution Neural Network(CNN)has strong ability to learn representative of image and Recurrent Neural Network(RNN)is good at modeling temporal information.Therefore,in this paper,we acquire our own Chinese sign language dataset by Kinect2.0,and construct the framework of SLR based on deep neural network.The main contents of this thesis are showed as follows:1.Firstly,we reveal the split property and context connection among Chinese char-acters in sign language.Then we refined the label for sign language by using Chinese characters as the basic element of our vocabulary.In the proposed sign language recognition method,the feature sequence is used as input and the rep-resentation sequence is used as output,just like video captioning.We construct a SLR framework based on Convolutional Neural Network(CNN)and Long Short Term Memory(LSTM).CNN is utilized to extract the spatial feature of images and LSTM is adopted to build the encoder-decoder.The encoder will take the spatial feature as input and then extract the temporal feature,and the decoder is for vocabulary element decoding.2.Secondly,we introduce the multi-modal fusion to further improve the perfor-mance.We also using the three dimension skeletal information acquired by Kinec-t2.0 as the trajectory feature.Combining with the image feature,we propose three fusion method based on the above mentioned framework.Specifically,it includes feature fusion method,model fusion with fixed weight and model fusion with adaptive weight.The experiment results show that all these fusion methods can improve the recognition performance.It can achieve up to 97.7%accuracy when using the model fusion with adaptive weights.3.Last but not least,we find that different frames in one sign language video have different effects in decoding time.Such as some frames can express meaning of sign language explicitly and some may be vague.In order to pay more attention to the most relevant frame information during different decoding moment,we introduce attention mechanism to our previous framework.The performance can be further improved up to 98.2%when integrating attention mechanism to the encoder-decoder based framework.
Keywords/Search Tags:Sign Language Recognition, Long Short Term Memory, Convolution Neural Network, Multimodel Fusion, Attention Mechanism
PDF Full Text Request
Related items