Font Size: a A A

Chinese Sign Language Recognition For Large Vocabulary

Posted on:2021-11-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:S L HuangFull Text:PDF
GTID:1488306323962719Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Sign language recognition refers to the use of computer technology to convert the sign language signals collected by sensors or cameras into text signals.Sign language recognition technology can improve the communication difficulties of deaf-mute people and hearing-impaired people,and promote the development of the field of human-computer interaction.It has great research significance and social value.In recent years,with the development of computer technology,sign language recognition has ranged from the simplest static gesture recognition,to isolated sign language word recognition,and finally to complex continuous sign language sentence recognition.A series of research results have emerged.In particular,the wide application of deep learning technology in various fields has brought new vitality and research methods to sign language recognition.However,there are still many problems in sign language recognition that have not been well resolved.Accurate and efficient sign language recognition still faces many challenges.Sign language recognition technology in different scenarios still needs to be continuously improved.Under the background of this research,this paper will focus on the topic of sign language recognition,take Chinese sign language as the research object,conduct research on isolated word sign language recognition and continuous sentence sign language recognition technology under a large vocabulary,and propose corresponding sign language recognition methods.The main innovations and contributions of this dissertation are listed as follows:1.Aiming at the problem that a large amount of redundant information in the sign language signal will adversely affect the efficient and accurate recognition of sign language,a Chinese sign language recognition method based on the spatiotemporal key information of the sign language signal is proposed.The key information extraction scheme of sign language signal is designed from two dimensions of space and time.Design the sign language composite feature composed of the convolutional neural network hand shape feature of the sign language player's hands and the bone point trajectory characterizing the movement of the sign language space in the spatial dimension,and extract the sign language key frame subsequence of the representative gesture in the time dimension Instead of the original sign language input sequence,the model complexity is reduced.Then,based on the analysis of the natural characteristics of Chinese sign language vocabulary,it is proposed to use fine-grained sign language sub-word units as the basic unit to characterize Chinese sign language,instead of sign language vocabulary as the basic target unit of sign language recognition.An encoding-decoding model based on a double-layer long short-term memory network is proposed to model sequence-to-sequence relationship of sign language to achieve end-to-end recognition of isolated Chinese sign language.And for the asynchronous performance of different modal features of sign language in sign language recognition,a model-level fusion scheme with better performance is proposed to solve this problem.2.In the process of sign language signals,the adjacent high-level semantic information is composed of low-level semantic information,and the low-level semantic information is concentrated between different short time periods.Traditional flat networks have poor modeling effect on sign language signals with such hierarchical structure.To solve this problem,a Chinese sign language recognition method based on boundary adaptive learning is proposed.Use the learnable boundary detection unit to adaptively learn the boundary information of the sign language signal in the time domain according to the changes of the sign language signal in the front and back directions,perform a certain segmentation operation on the sign language sequence according to the boundary information,and convert the underlying sign language visual information In the higher-level coding structure,semantic information is formed to complete the hierarchical coding of sign language,and the accuracy of model recognition is improved.At the same time,in order to mitigate the effect of long sign language sequences on the recognition effect,a window attention model combining relaxed attention weight constraints is proposed to decode the encoded information in the limited window area to ensure the stability of attention.In addition,to explore the potential of sign language sub-words for modeling sign language in the case of large-scale vocabulary and long-sequence sign language sequences,a method for sign language sub-word units to integrate isolated sign language word recognition and continuous sign language sentence recognition into the same model for recognition,Which is more in line with the actual situation where the target of unknown sign language recognition in a real situation is a single word or a sentence.3.In view of the lack of path constraints in the encoding-decoding model,considering the monotonous alignment of the sign language input sequence and the output sequence,a Chinese sign language recognition method based on improved connectionist temporal classification constraints is proposed.Based on the analysis of the "spike" problem in connection time classification,two improved connectionist temporal classification methods are proposed.Firstly,in view of the problem of non-blank label symbols in the output path being flooded by a large number of blank label symbols in the original connection time classification method,a weighted connectionist temporal classification method is proposed.By giving different weights to different nodes in the path,less network output blank labels account for a high percentage ratio path;then for the input sequence corresponding to the label symbol in the timing signal to maintain a certain length of a priori information,a connectionist temporal classification method based on length information constraint is proposed.By constraining the path length a priori information,the network output path is avoided individually the path segment corresponding to the symbol label is too long or too short to make the network output a more reasonable and feasible path.Finally,the improved connectionist temporal classification model is used as the coding supervision information of the encoder-decoder sign language recognition method to supervise the encoder,restrict the monotonous alignment of the output sequence and the sign language input sequence,and jointly train the entire model to improve the performance of the model in real large vocabulary scenes.
Keywords/Search Tags:Chinese sign language recognition, deep learning, spatiotemporal key information, adaptive boundaries, connectionist temporal classification
PDF Full Text Request
Related items