As one of the research hotspots,facial expression recognition has been widely used in the fields of autonomous driving,social networking,and classroom teaching.Facial expression recognition involves the cross-integration of multiple disciplines,such as computer science,biology,psychology,etc.,and is a valuable research topic.Due to the subtlety of expressions,the expressions on faces are not easy to distinguish,so real-time facial expression recognition is still a big problem.At the same time,this paper mainly focuses on video data,which is more complicated than single-frame images.Therefore,based on the above problems,this paper optimizes the design based on the traditional VGG-16+LSTM network framework,aiming to improve the accuracy of video facial expression recognition.The specific work is as follows:(1)Aiming at the problem of inaccurate expression feature extraction by traditional networks,this paper proposes Feature Enhancement Convolutional Networks(FECNN),which is studied from the perspectives of single-frame feature enhancement and inter-frame feature enhancement to improve the accuracy of video expression recognition.rate purpose.First,a 7×7 convolutional layer operation is extended in the middle layer of VGG-16 to extract shallow facial expression features,and fused with deep features to increase the spatial information of facial expressions;then,the expansion rate is applied in the last layer of VGG-16 It is an atrous convolution of 2,which reduces the information loss while increasing the receptive field of the convolution operation.Next,the Squeeze-Excitation mechanism is used to give weights to the facial expression feature channels to improve the accuracy of facial expression single-frame features.Finally,a Self-attention mechanism is introduced to give weights to video frames according to the correlation between video frames to improve the accuracy of multi-frame features of facial expressions.The theoretical idea has been verified and compared on the AFEW dataset,CK+ dataset,SFEW dataset,and FER2013 dataset,which confirms the superiority of the model.(2)Aiming at the problem that the inter-frame attention mechanism in the feature-enhanced convolutional network is not suitable for processing long sequences and the information extraction is not rich,this paper adopts the multi-head attention mechanism in the transformer to replace the interframe attention mechanism,and aims at the information caused by the deviation of attention weights.The dropout problem proposes the Multi-Head Prior Attention Mechanism(MHPAM).The model is simulated and verified on the AFEW dataset and the CK+ dataset,which confirms that MHPAM can improve the accuracy of facial expression recognition.(3)Aiming at the problem that the inter-frame attention mechanism in the feature-enhanced convolutional network is not suitable for processing long sequences and the information extraction is not rich,a multi-head attention mechanism in the transformer is proposed to replace the inter-frame attention mechanism.A new Multi-Head Prior Attention Mechanism(MHPAM)is introduced,which combines the output feature map of the multi-head attention mechanism with the VGG-16 output feature map,and uses the VGG-16 output feature map for feature supplementation and attention.Weight guidance greatly enriches the features.The model is verified by simulation on AFEW dataset and CK+ dataset,which confirms the superiority of the model. |