Font Size: a A A

Facial Expression Recognition Based On Multi-scaled Feature Fusion

Posted on:2024-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LouFull Text:PDF
GTID:2568307136499424Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Facial expression recognition,as one of the common research fields in computer vision,has important application value in real-world scenarios such as human-computer interaction.At present,there are still some problems in the field of facial expression recognition.For example,although facial expression recognition based on deep learning is the current mainstream method,in the stage of feature extraction using deep neural networks,with the deepening of network layers,the receptive field of the network gradually becomes larger,resulting in the reduction of the resolution of the output feature map and the poor perception of details.Multi-scaled feature fusion technology integrates shallow and deep features of images,which can solve the problems of semantic lack of shallow features and loss of deep feature details.In order to extract more effective features from images,an attention mechanism has been introduced,which focuses the network on useful information for facial expression recognition and improves the accuracy of it.This article conducts research on facial expression recognition based on multi-scale feature fusion.The main research content and innovative work are as follows:(1)Aiming at the problems of deep feature details loss and shallow feature Semantic information lack in deep neural network,a facial expression recognition algorithm based on multi-scaled feature fusion is proposed.The experimental results on the FER2013 dataset show that when VGG16,Inception V3,and Res Net50 are used as backbones,the accuracy of the expression recognition model based on multi-scaled feature fusion is improved by 0.79,1.02,and 0.50 percentage points compared to the expression recognition model based on a single feature,respectively;The experimental results on the RAF-DB dataset show that when VGG16,Inception V3,and Res Net50 are used as backbones,the accuracy of the expression recognition model based on multi-scaled feature fusion is improved by 1.13,0.36,and 0.36 percentage points compared to the expression recognition model based on a single feature,respectively,verifying the effectiveness of multi-scale feature fusion.(2)In order to further improve the performance of facial expression recognition,attention mechanisms have been introduced into the network.An improved channel attention module and an improved spatial attention module were proposed,and the two were cascaded to obtain a hybrid attention module.The experimental results on the FER2013 dataset showed that when VGG16,Inception-V3,and Res Net50 are used as backbones,the accuracy of the multi-scaled feature fusion based expression recognition model with mixed attention is improved by 1.33,1.60,and 1.08 percentage points compared to the multi-scale feature fusion model without hybrid attention,respectively;The experimental results on the RAF-DB dataset show that when VGG16,InceptionV3,and Res Net50 are used as backbones,the accuracy of the expression recognition model based on hybrid attention multi-scaled feature fusion is improved by 1.35,1.76,and 1.99 percentage points compared to the expression recognition model without hybrid attention multi-scaled feature fusion,respectively,verifying the effectiveness of the hybrid attention module.(3)Due to the localization of convolution operation,CNN’s inherent small receptive field limits its ability to understand the scene globally.In order to increase the receptive field,multi-head attention is introduced into the network to improve the performance of facial expression recognition of the model.A multi-scaled feature information interaction and fusion module based on multi-head attention is proposed.The experimental results on the RAF-DB dataset show that when Res Net50 is used as the backbone,the expression recognition model based on multi-scaled feature fusion with multi-head attention can achieve an accuracy of 82.99%,verifying the effectiveness of the multiscaled feature information interaction and fusion module.(4)In order to further improve the quality of network feature extraction,pyramid convolution is introduced into the multi-scaled feature fusion facial expression recognition model based on multihead attention.The residual module composed of convolutions of different sizes and depths is used to replace the residual block in Res Net50 to improve the residual network.The experimental results on the RAF-DB dataset show that when Res Net50 is used as the backbone,the expression recognition model based on multi-scaled feature fusion of multi-head attention improves the accuracy of the model by 3.57 percentage points compared to the expression recognition model without pyramid convolution,verifying the effectiveness of pyramid convolution.
Keywords/Search Tags:facial expression recognition, multi-scale feature fusion, attention mechanism, pyramid convolution, residual network, deep learning
PDF Full Text Request
Related items