Font Size: a A A

Sign Language Translation Based On Spatial-temporal Graph Convolutional Network

Posted on:2021-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z WangFull Text:PDF
GTID:2428330614463870Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Sign language translation is a comprehensive task involving multiple technical fields such as computer vision,natural language processing and pattern recognition.It has extremely broad application prospects in the fields of intelligent scene recognition and sign language video retrieval.It is of great significance for the deaf-dumb disabled people to participate in normal communication.This paper focuses on the graph convolutional network to extract the spatiotemporal features from the human skeleton graph data,and serialized modeling by the encoder-decoder network,with text as output.Finally,a sign language translation method based on Spatial-temporal Graph Convolutional Network(ST-GCN)is proposed.Deep model methods based on convolutional neural networks are widely used in Euclidean data processing in fields such as picture recognition and video analysis.Human skeleton joints as a natural non-Euclidean data cannot be directly processed by the conventional deep model,but are usually converted into Euclidean data,resulting in the loss of inherent structural information.In this paper,the ST-GCN is directly used on the human skeleton joints data for recognition of sign language actions.First,the coordinate information of the skeleton joints is obtained by means of pose estimation.After constructing the skeleton joints graph data,the spatiotemporal features are extracted using the STGCN.The softmax classifier is used to classify the sign language actions in the video.The experimental results on the sign language dataset show that the spatiotemporal features from the skeleton joints data can be directly extracted using the proposed method and achieve good results on the sign language action recognition task.Sign language movements in sign language videos are composed of arm movements and hand movements,and there are significant differences in movement amplitude and semantic accuracy.It is difficult to distinguish the differential information between the arm movements and hand movements based on the method of global features,which usually have a robust expressive ability in complex sign language action recognition in terms of feature representation.Therefore,for the human skeleton graph data,the two-stream spatial-temporal graph network model is proposed to separately extract the spatiotemporal features of the human torso and hand to effectively capture the differential motion information between different body parts.Then,the feature aggregation method is used to process the resulting serialized features and use the attention mechanism-based encoding-decoding network to translate the serialized features into text.The experimental results on the sign language dataset show that the proposed method can obtain a more robust feature representation and effectively improve the accuracy on sign language translation tasks.Aiming at the problem of over-smoothing in graph convolutional networks,a spatial-temporal graph convolution structure based on residual connection is proposed to solve sign language translation.The embedding of this structure can effectively solve the problem of inter-domain information crosstalk in the continuous stacking of spatial-temporal graph convolutional layer.Finally,the Transformer structure is introduced to construct a sign language translation model of spatialtemporal graph convolutional network based on residual connection.The experimental results on the public sign language dataset RWTH-PHOENIX Weather 2014 show that the sign language translation method based on the spatial-temporal graph convolutional network is feasible and effective,and has important reference value.
Keywords/Search Tags:Sign Language Translation, Graph Convolutional Network, Encoder-decoder Network, Attention Mechanism, Residual Connection
PDF Full Text Request
Related items