Font Size: a A A

Research On Key Technologies Of Sign Language Recognition For Deaf-mutes

Posted on:2020-11-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J LiangFull Text:PDF
GTID:1367330578976508Subject:Education Technology
Abstract/Summary:PDF Full Text Request
The number of people with disabilities is very large.With the increasing demand for special education,it is an inevitable trend to build an open and integrated modern education system to make more achievements in education development and more equitable benefits for people with disabilities.As the main place for disabled people to learn informally,the science and technology museum is one of the important ways for them to receive education.Communication barriers faced by deaf-mutes include that they difficulty to obtaining information about exhibits,and staffs cannot understand sign language.Therefore,to use of burgeoning information technology to recognize sign language automatically is very helpful for communication between deaf-mute and healthy people.Moreover,the researches of sign language recognition(SLR)have important practical significance for building a harmonious society and improving the system of education for whole nation.At the same time,as the most intuitive expression of human body,the application of sign language helps to upgrade human-computer interaction(HCI)to a more natural and convenient way.Therefore,SLR is a research hotspot in the field of artificial intelligence(AI).In recent years,as the vanguard of a new round of AI revolution,deep learning has injected new vitalities into the fields of pattern recognition and computer vision.With the application of Kinect 2.0 and other new interactive devices,the researches of SLR have also ushered in a new opportunity.At present,there are several key challenges in SLR.(1)The effectiveness of the deaf-mute sign language data set is difficult to guarantee.On the one hand,in order to make the training model achieve SLR for user independent,it needs a large number of demonstration data from different people.On the other hand,few studies can use the actual data set of deaf-mutes.In the case of using standard sign language data,the collected data is small in scale,poor in fault tolerance,and the differences are actually ignored.(2)The actual application scenarios of sign language are often complicated,and objective factors such as background and illumination have great interference on the recognition performance of the algorithm.(3)Compared with traditional gestures,sign language sequences have the characteristics of rich ideograms,flexible movements,and the problem of limb occlusion,which makes it difficult to design differentiated sign language representation.(4)The ultimate goal of SLR is to realize the recognition of continuous sign language.However,continuous sign language has transitional redundancy data that does not belong to any sign language word,which will seriously affect the recognition of continuous sign language.Based on the above background,this dissertation focuses on the SLR through deep learning,and the three-dimensional Convolution Neural Network(CNN),Recurrent Neural Networks(RNN),Residual Networks,Attention mechanism and Multi-mode fusion are explored.Based on these models,the recognition of dynamic sign language keywords and continuous sign language sequences is realized concretely,and some meaningful results are obtained.1.In order to solve the problem(1),several stages of SLR technology with the continuous evolution of sensor equipment were analyzed in section 2,and a SLR scheme based on computer vision and new generation of interactive equipment were proposed.In order to overcome the interferences of illumination and background noise,a multi-mode homologous data acquisition scheme was explored using Kinect V2 sensor,and an open data set of sign language was constructed.2.In order to solve the problem(2),a SLR model based on three-dimensional Convolutional Neural Networks and multi-modal homologous data have been proposed in section 3.This model applied to extract spatial and temporal features from video streams,and the motion information is captured by noting the variation in depth between each pair of consecutive frames.To further boost the performance,multi-modal of streams including infrared video,contour video and skeleton are used as input for the architecture.The prediction results estimated from the different sub-networks were fused together to make the model more accurate for the interference of illumination and background noise in different scenes,thus ensure the robustness of the network to missing signals.3.In order to solve the problem(3),a new SLR algorithm based on wide residual networks and long short-term memory networks have been proposed in section 4.Firstly,spatial and temporal features are extracted from the fine-tuned 3D convolutional neural networks.Next,a bidirectional convolutional long short term memory networks is utilized to further take into account the temporal aspect of image sequences.Lastly,these higher level features are sent to the wide residual networks for final SLR.An effective fusion strategy for sub-networks with different input modalities is proposed,which efFectively improves the recognition ability of the model for sign language.4.In order to solve the problem(4),a continuous SLR algorithm based on attention mechanism and long short-term memory networks have been proposed in section 5.Faced with the continuous sign language that needs to be processed,this algorithm uses pseudo-three-dimensional residual network combined with the balance hinge loss function to detect the transition frames in the long sequence,and to determine the time boundary of words.In the stage of recognition,the pseudo-residual network is used to extract the spatial and short-term dynamic features of sign language from video stream.The convolutional long short memory network which integrates attention mechanism is used to encode the short-term spatial features.Therefore,the long-term and context information of sign language are fully obtained.In the stage of classification,the wide residual module is applied to accurately represent the spatial features so as to obtain the final recognition results.
Keywords/Search Tags:Intelligent education system, Sign language recognition, Computer vision, Deep learning
PDF Full Text Request
Related items