Font Size: a A A

Multi-scale 3D Residual Attention Network For Facial Expression Recognition

Posted on:2021-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y R WangFull Text:PDF
GTID:2518306470986769Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Facial expression is one of the important signals of human beings.In recent years,with the development of pattern recognition technology,facial expression recognition(FER)has gradually become a popular research in the field of artificial intelligence,which has been widely used in many fields such as human-computer interaction,intelligent monitoring,and fatigue driving.Most FER methods mainly focuses on static expression image research,which may causes low accuracy and poor generalization since the static expression only contains limited texture and contour information,the accuracy of facial expression recognition is low and the generalization is poor.However,facial expression is essentially a changing process.The dynamic video expression sequence contains rich spatial information and contextual activity information,the research of which has far-reaching significance for greatly improving the accuracy of FER.In this paper,we propose a multi-scale spatio-temporal convolutional residual attention network(3D-Res Att Net)for facial expression recognition,the architecture of which forms attention mechanism and feed-forward network.Our residual attention network is built by stacking attention modules which generate attention perception functions.As the modules going deeper,the attention-aware features change adaptively from different modules,which can improve the accuracy of facial expression recognition.The attention module uses top-down and bottom-up feed-forward structures to add soft weights to features,which can learn more important feature of facial expression recognition.To prevent the problem of network performance degradation due to the deeper network,we adopt attention residual learning.The training time and computer hardware requirements are greatly increased because of the large computational complexity of the 3DCNN network.In order to reduce the complexity of the model,this paper adopts the idea of separate convolution and designs P3D-A of a pseudo threedimensional residual block for feature extraction.Compared with the ordinary threedimensional convolutional neural network,the number of parameters and space complexity of the pseudo three-dimensional convolutional neural network have been greatly reduced.In order to solve the problem of insufficient valid training data,we use the rotation of sample data and multi-scale changing for training data,which not only avoids the overfitting of the 3DRes Att Net network,but also makes scale-invariant to the network.Compared with most methods of FER,this method can extract both spatio-temporal features at the same time.Our network structure is simple and in an end-to-end training fashion.In this paper,we conduct the experiments on three benchmark datasets,CK+,Oulu-CASIA and MMI,the Top-1 accuracy of which reaches 99.6%,92.4% and 81.2% respectively.The results show that the proposed method achieves a performance superior to the state-of-the-art methods,fully explaining our method 3D-Res Att Net has a strong ability to express the spatiotemporal information of FER and improves the accuracy of FER.
Keywords/Search Tags:Facial Expression Recognition, 3D Convolutional Neural Network, Attention Network, Residual Learning, Spatial-Temporal Features
PDF Full Text Request
Related items