| Emotion recognition is a research topic in computer vision field in recent years.The current research results mainly focus on emotion recognition only based on facial expression or emotion recognition only based on body action,but these single-mode emotion recognition algorithms are not strong generalization ability,and there is room for improvement of accuracy.In order to improve the accuracy of emotion recognition,this thesis studies the emotion recognition algorithm of characters in videos based on multi-modal feature fusion,and focuses on how to comprehensively use facial expression features and body action features to recognize the character emotion in video.Firstly,in order to improve the accuracy of emotion recognition,a character emotion recognition algorithm FBER which integrates facial expression and body action features is designed.In the algorithm,C3 D network is used to extract the temporal and spatial features of the face expression and body action in the video;the MOD algorithm is used for dictionary learning of extracted feature vectors under the framework of sparse coding tree,and SVM classifier is used to classify and recognize emotions.Experimental results of character emotion recognition based on FABO data sets show that FBER algorithm has higher accuracy compared with single mode emotion recognition algorithm only for face expression or body action,or other emotion recognition algorithms integrating facial expression and body action.Then,considering that the extraction of facial expression features and body action features has a direct impact on the emotion recognition effect,in order to extract facial expression and body action features of the characters in the video more effectively,the C3 D network is improved,and the attention mechanism based 3D convolution network(named AM-C3D)is designed.AM-C3 D integrates the CBAM attention mechanism with 3D convolution to form 3DCBAM attention mechanism,and then combines the 3DCBAM attention mechanism with C3 D network,so as to improve the utilization rate of channel features and spatial features in the video and enhance interested specific target areas.Meanwhile,the irrelevant background area is weakened.Experimental results on real data sets FABO show that AM-C3 D network has better recognition effect than C3 D network.Finally,AM-C3 D is applied to the FBER algorithm,forming the FBER algorithm AM-FBER based on AM-C3 D to further improve the accuracy of character emotion recognition in the video.In order to test the effectiveness and practicability of the AM-FBER algorithm,a relatively simple prototype system for movie watching emotion recognition is developed.The system uses AM-FBER algorithm to recognize the emotions of people watching videos.The application results show that the AM-FBER algorithm can be effectively applied to the prototype system for movie watching emotion recognition,and can accurately identify the emotion during the film watching. |