With the continuous development of China’s economy and society,people’s demand for intelligence in daily life is also increasing.In the field of intelligent life,human emotion recognition is playing an increasingly important role,such as the assisted diagnosis of mental illness,the assisted treatment of autism in children,and the assisted driving of drivers.The recognition of human emotions by robots can make them better serve human beings,but there are still situations where individual human emotion expressions vary greatly,where unimodal information such as visual or auditory information is not sufficient to interpret emotions,and where facial masking makes expression recognition difficult,which poses a challenge to the recognition task.Based on this,this thesis proposes a multimodal emotion recognition algorithm as well as an algorithm for facial expression recognition with occlusion,and proposes an emotion recognition robot system to be validated on the NAO robot.The main contents of the thesis are as follows.This thesis proposes a multimodal emotion recognition algorithm based on attention fusion to address the problem of emotion recognition in audio-visual fusion.For video,a residual channel spatial attention(RCS-Attention)structure is proposed,and human facial expressions are recognized using an improved Multi-task Cascaded Convolutional Networks(MTCNN)and RCS-VGG19(Residual channel spatial attention-VGG19)network to analyze human facial expressions;for audio,a combination of self-attentive mechanism and Bidirectional Long Short-Term Memory(Bi LSTM)network to For audio,a two-stage attentional fusion mechanism is proposed to fuse the features of two modalities.The method achieves 74.25% accuracy for expression recognition on the FER2013 dataset and 86.16% accuracy for emotion recognition on the RAVDESS dataset.To address the problem of masked facial expression recognition,this thesis proposes an algorithm for masked expression recognition based on generative adversarial networks.Firstly,for the most common mask occlusion today,a lightweight detection algorithm EFFICIENT-YOLO is proposed.The algorithm combines the EFFICIENTNet network to optimise the YOLOv3(You Only Look Once version 3)algorithm,which reduces the computational effort by about 2/3 and improves the detection accuracy by 3.28% compared with the traditional YOLOv3 algorithm;secondly,for faces with occluders,an Secondly,for faces with occlusions,an improved Generative Adversarial Network(GAN)is proposed to remove the occlusions from the faces,and the removed face images are fed into the expression recognition network for classification.The experiments show that the recognition accuracy of the masked expressions is improved by an average of 14.95%compared with the original expression recognition algorithm.This thesis uses the NAO robot as an experimental platform to validate the effectiveness of the proposed multimodal emotion recognition algorithm and masked expression recognition algorithm.In the validation of the multimodal emotion recognition algorithm,the initial exploration of emotion monitoring for depressed patients is carried out,and seven robot body movements are set according to different emotion categories,so that the NAO robot can make corresponding action feedback to different emotions;in the masked expression recognition algorithm,EFFICIENT-YOLO is validated,and three body movements are set according to whether the face in the image wears a mask or not,and the mask wearing detection for people entering public places. |