Font Size: a A A

Research On Dataset Building And Recognition Of Chinese Sign Language Light Field

Posted on:2021-04-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:W T WangFull Text:PDF
GTID:1488306470467424Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a kind of special language by communicating through actions and visions,sign language is an important tool for the hearing impaired to communicate with each other and exchange ideas.According to data provided by the World Federation of the Deaf,there are about 70 million people with hearing impairments in the world,56 million of them are unable to receive education due to limited sign language education resources,and cannot communicate effectively even using paper and pens.According to statistics,there are currently at least 20.57 million persons are hearing impaired in China,but sign language interpreters,especially high-level professional interpreters,are extremely scarce.This makes great difficulty for the hearing impaired to receive information and integrate into the social subject.With the rapid development of computer-related technologies such as pattern recognition,machine learning,image processing,and computer vision in the 1990 s,research on dynamic gesture(sign language)recognition has gradually become a research hotspot and has made meaningful progress.However,as a kind of fine-grained video recognition task,dynamic gesture(sign language)recognition is still very challenging.Complex backgrounds,different lighting conditions,uncertain views,gesture occlusion and other issues make it difficult to recognize dynamic gestures(sign language).The subtle changes and complexed variation of gestures bring further challenge to the recognition task.With the deepening research on sign language recognition,the demand for sign language data sets is also expanding.Domestic and abroad sign language datasets of various sizes and characteristics have been launched one after another,but there is no sign language dataset based on light field.Traditional visual sign language data observes the three-dimensional world through a two-dimensional sensor,which discards some spatial information.Compared with traditional data forms,light field data is a complete representation of the scene recording more spatial geometric information.In this dissertation,a large scale dynamic light field dataset for sign language was built,an attention based Epitome-Net for better feature fusion was proposed for finegrained sign language video recognition and a bi-directional generative adversarial transfer learning model was proposed for weakly labeled sign language recognition.The main contributions of the dissertation are summarized as follows:First,a light field data set for Chinese sign language was established.Aiming at the problems of arranging large number of cameras and excessive collection of redundant information in the light field acquisition environment,an optimization model of camera arrangement was proposed.With the optimal camera arrangement,sign language light field data of different persons with different lighting modes were collected,and the data set was calibrated and preprocessed.Second,based on self-attention and mixed-attention mechanisms,a multi-stream Epitome-Net was proposed for sign language recognition.The multi-stream data includes the original Epitome,the optical flow Epitome and the edge Epitome,and the edge optical flow Epitome calculated based on the source video clips.The Epitomes are fed into the shape branch and the motion branch.Self attention module is applied after each convolutional layer to enhance temporal and spatial salient features.Through the mixed attention module,the two stream features in a branch are complementarily fused.The features of the last convolutional layer of the branch networks are concatenated to obtain Epitome-level recognition results.Finally,all Epitome recognition results voted to obtain the classification result of the gesture video.Experimental results show that the proposed model is very robust.Third,this dissertation implemented transfer learning between real sign language data and virtual sign language data based on a bi-directional adversarial generative network.Through bi-directional adversarial generation networks,features of real sign language data are transfered to virtual sign language data field.The corresponding fake virtual sign language data and fake real sign language data are generated and fed into two classification networks to identify the sign language word.
Keywords/Search Tags:Sign Language Recognition, Sign Language Light Field, Deep Learning, Transfer Learning, Attention Mechanism
PDF Full Text Request
Related items