Font Size: a A A

Research On Facial Expression Recognition Based On Deep Convolutional Neural Network

Posted on:2024-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2568307061989879Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Carefully weigh up a person’s facial expression and think about them.Facial expression is one of the most direct and natural external expressions for the human to convey emotional states and intentions.Facial Expression Recognition(FER)technology aims to study how to automatically,reliably,and efficiently obtain and use the information conveyed by facial expressions,and then quickly and accurately infer people’s emotional intentions.It is widely used in many fields such as human-computer interaction,security monitoring,virtual reality,and medical diagnosis.The key to FER technology is how to effectively extract facial expression features.Traditional machine learning often uses manual feature design methods,which require a lot of prior knowledge and leads to subjectivity.With the rapid development of deep learning,the use of the convolutional neural network for feature learning to extract deeper and more abstract semantic representation has made great progress.However,facial expression images in natural scenes are often interfered with by non-expression factors(such as facial pose,illumination,and occlusion),which seriously affect the recognition accuracy of facial expressions.In addition,a static facial expression image often has complex emotional intentions,and single-label facial expression datasets cannot effectively describe complex emotional tendencies,which will lead to ambiguous expressions.At present,the mainstream deep facial expression recognition models usually contain a huge number of parameters,and their high computational overhead is difficult to accept in the actual natural scene.Therefore,in addition to continuing to improve the recognition accuracy,the FER task should also consider how to reduce the parameters overhead of the model.In response to the above problems,based on the study of the basic theory of facial expression recognition and three modern convolutional neural network algorithms,this paper puts forward the following specific work:1.A facial expression recognition based on improved residual network(MSP-ACB)was proposed to reduce the subjectivity in the pooling process and enhance the performance of the convolution kernel.By constructing multi-scale importance pooling and asymmetric convolution blocks,the model can not only enhance the ability to extract local detail features but also adaptively learn and retain global importance features during downsampling.Specifically,in the convolution layer,the asymmetric convolution kernel groups are fused,and then the fused asymmetric convolution blocks are used to initialize the original network.The fused convolution kernel parameters have a stronger feature extraction ability,to enhance the weight of the image principal axis position,which is conducive to the model’s feature extraction of important local positions of facial expression images.This paper proposes multi-scale importance pooling for feature compression from the perspective of local importance,and adaptively learns the importance weight of input global features at multi-scale,to retain discriminative features,which avoids the subjectivity of prior knowledge.The influence of the amplification coefficient on different datasets is studied and analyzed.The MSP-ACB method is verified and analyzed on the facial expression datasets RAF-DB and FER2013,and the recognition accuracy reaches 85.53% and72.72%respectively.Compared with the original backbone network and the recent related methods,the recognition accuracy is improved.The effectiveness of each module in the MSP-ACB method is verified by ablation experiments.2.A lightweight facial expression recognition based on spatial Grouping Enhanced Attention(SPFER)was proposed,which can not only improve the accuracy of facial expression recognition in natural environments but also optimize the number of parameters of the model.Specifically,in the shallow network,a parallel deep convolutional residual structure is designed,and the Re LU6 activation function is combined to reduce redundancy,to enhance the ability of the model to represent the local details of facial expression,and the local features are fused with the global features.In the deep network,a spatial grouping enhanced attention mechanism is established to solve the problem of unstable feature spatial distribution caused by expression image noise,highlight the key areas of semantic features,and help the deep network to enhance the fine-grained learning of expression images.To avoid model overfitting and improve the generalization ability of the model,the improved depthwise separable convolution is used as the output of the backbone network without greatly increasing the computational complexity.The SPFER method achieves88.33%,63.09%,and 60.12% recognition accuracy on the facial expression datasets RAF-DB,Affect Net-7,and Affect Net-8,respectively,which is higher than the recent facial expression recognition methods.This is 20% compression compared to the original backbone network.3.A lightweight facial expression recognition based on weight inference and label smoothing(Light-Weight FER)was proposed,which still has high recognition performance while maintaining low computational complexity.Specifically,from the perspective of feature extraction,through the analysis and pruning of the original network,a more streamlined and efficient network model is obtained,which not only further compreses the computational complexity of the model but also improves the representation ability of the model.To enhance the ability of the model to extract local detail features of facial expression images and suppress non-expression features,a key weight inference module of channel space based on Max pooling is embedded in the model,and this module has a small computational overhead.From the perspective of training strategy,the label smoothing learning method is used to supervise the learning of the network without introducing additional information,to reduce the adverse effects caused by ambiguous expressions on recognition performance.The Light-Weight FER method achieves 86.91%,61.80%,and 58.75%recognition accuracy on the facial expression datasets RAF-DB,Affect Net-7,and Affect Net-8,respectively.It still has a good recognition performance under the limited condition of only 0.95 M parameters.
Keywords/Search Tags:Facial expression recognition, Multi-scale feature, Feature fusion, Attention, Lightweight
PDF Full Text Request
Related items