| Emotions play an essential role in human information exchange and behavioral cognition.People hope that human-machine interaction(HMI)can realize the emotional level of communication and interaction instead of the traditional mechanical command execution.Obtaining accurate identity and emotional information can help robots analyze and understand human intentions to provide more secure,active,and attentive services.Gait recognition is a type of biometric technology that can recognize the identity and emotional information of a walker by analyzing gait characteristics.Recently,there has been a drastic growth of novel works on gaitbased identity recognition and emotion recognition technologies,including human-computer interaction,identity authentication,security monitoring,depression-assisted diagnosis,emotional robots,and so on,bursting with great vitality and development potential.Previous studies treat gait-based identity recognition and emotion recognition as separate and independent subproblems,ignoring the rich correlation information between them.Most of the gait biometric recognition techniques only apply to ideal conditions.There are many problems and challenges in conditions such as noisy data,restricted data sets,and multiple perspectives.This paper presents an in-depth study of the key issues in gait-based joint recognition of identity and emotion based on deep learning theory,whose main contents are summarized as follows:· The gait identity information and emotion information are entangled and complexly related.Thus the emotional factors in gait features can directly affect the results of identity recognition.In contrast,the applications of emotion recognition often need to simultaneously recognize the target’s identity to reduce the influence of individual differences.This paper presents a deep learning network model based on multi-task learning(MTL)architecture,which shares information among multiple gait-based subtasks and learns a unified gait feature space to perform gait identity and emotion recognition.Moreover,a novel attention-enhanced temporal Graph Convolutional Network(AT-GCN)is also proposed to effectively capture the spatial dependence and temporal dynamic features of gait skeleton joints through the spatial and temporal attention mechanisms.Through the MTL structure and AT-GCN module,this paper achieves state-of-the-art performance on identity and emotion recognition benchmarks.· Existing skeleton-based gait recognition methods inevitably suffer from data distortion and data missing problems in human skeleton pose estimation.To address the problem that the introduction of noise in gait skeleton data can degrade the performance of iden-tity and emotion recognition models,this paper proposes a denoising autoencoder model based on the AT-GCN network and a Transformer self-attention-based denoising encoder that can automatically reconstruct gait skeleton trajectories and correct the errors caused by the pose estimation algorithm.Moreover,by using the temporal smoothness constraint of gait trajectories and the designed Siamese network structure,this paper encodes the features of input trajectories into a latent space to reduce intra-class variations and increase inter-class variations of the embedding vectors.Experiments demonstrate that our method enhances the robustness against inaccurate skeleton estimation and achieves substantial improvements over mainstream skeleton-based methods for gait recognition tasks.· To address the problems of low recognition rate and weak generalization ability of the gait-based identity and emotion joint recognition model with restricted datasets,this paper proposes a generative adversarial network model for emotional gait conversion to realize the mutual transformation between natural and emotional gait.This paper employs two auto-encoders to disentangle latent identity and emotion-specific representations using two auxiliary classifiers to ensure minimal entangled information and a unified decoder to generate the synthetic emotional gait sample.By the emotional conversion model,two strategies for data augmentation are designed to increase the amount and diversity of the original dataset.Experimental results show that the emotion classifiers trained on the augmented dataset are competitive with state-of-the-art gait emotion recognition systems.· To address the problem that video data captured under a single view have large ambiguities in inferring the 3D coordinates of the gait skeleton,this paper proposes a multiview fusion method for 3D pose estimation based on the Skinned Multi Person Linear(SMPL)model,which can get the more accurate 3D human shape and poses information from multi-view videos through a video frame synchronization algorithm and crossview projection consistency constraints.In addition,to address the weakness of existing skeleton-based methods in gait feature expression capability,a point-based approach for gait recognition tasks divides the point cloud data into regions according to the spatial location of the joint points.It integrates point-wise encoding,region-wise encoding,and frame-wise encoding modules to capture the gait point cloud’s 3D spatial geometric dependencies and dynamic features.The experimental results represent the unique advantages and great potential of the point-based gait recognition approach in identity and emotion joint recognition tasks under multi-view conditions. |