Font Size: a A A

Research On Dynamic Facial Expression Recognition Based On Two-stream Convolutional Neural Network

Posted on:2022-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:F H LvFull Text:PDF
GTID:2518306557969199Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
At present,the commonly used emotion recognition methods mainly realize the recognition of people's emotions through the analysis of features such as speech,expression,posture,and physiological signals.Expression recognition is a relatively mature part of emotion recognition.Traditional expression recognition methods are generally based on static images and are not suitable for dynamic expression recognition.With the popularity of deep learning,there are more and more researches on dynamic expression recognition.In order to simultaneously utilize the temporal and spatial information of the video and improve the accuracy of dynamic expression recognition,the two-stream convolutional neural network was applied to dynamic expression recognition,and it got a good recognition rate,but it still had shortcomings.It ignored the correlation between the two streams of networks.In order to strengthen the connection between the two-stream networks and further improve the recognition rate of dynamic expressions,this paper adds an attention mechanism on the basis of the traditional two-stream network research model,and proposes two dynamics of the two-stream network based on the attention mechanism.The main research contents of this paper are as follows:(1)The dynamic expression recognition method based on two-stream network is studied.The two stream of the two-stream network are the temporal network and the spatial network.This article first studies the two sub-networks separately.The spatial stream network uses a single frame image of each dynamic video as input information,and the temporal stream network uses the optical flow diagram between frames as input information.Both use the VGG16 model parameters trained on the Image Net database,and conduct experiments on the neonatal pain expression database,the ENTERFACE database and the AFEW database.The results show that the two-stream convolutional neural network that combines temporal and spatial information has a much higher recognition rate for dynamic expressions than a single temporal network and spatial network.(2)A two-stream convolutional neural network based on a shared-attention mechanism is proposed.In order to strengthen the correlation between the temporal network and the spatial network,and extract the emotional features useful for facial expression recognition,this paper adds a shared-attention mechanism to the original two-stream network to form a two-stream network based on the shared-attention mechanism.The two-stream network based on the shared-attention mechanism is used to recognize dynamic expressions in the three databases,and the dynamic expression recognition accuracy rates obtained are 65.5%,79.52%,and 58.75%,respectively,which are higher than the recognition accuracy of the single two-stream convolutional neural network 3%,3.33%,and 2.88%.To a certain extent,the accuracy of emotion recognition is improved.(3)A two-stream convolutional neural network based on a cross-attention mechanism is proposed.In order to realize the information exchange between the two-stream networks and remove the emotional features that are irrelevant or even counterproductive to emotion recognition,this paper combines it with the cross-attention mechanism to form the two-stream convolutional neural network based on the cross-attention.Compared with the traditional two-stream network,the dynamic expression recognition effect of the two-stream network based on the cross-attention mechanism has been significantly improved.The accuracy of the three dynamic expression video libraries reaches 66%,80.95% and 60.31%,which is more accurate than the two-stream network.The rates are increased by 3.5%,4.76%,and 4.44%,respectively,and the accuracy rates of the two-stream network based on the shared-attention mechanism are also increased by 0.5%,1.43%,and 1.56%.This shows that the two-stream network based on the cross-attention mechanism makes full use of the temporal and spatial information of the video than the two-stream network based on the shared-attention mechanism,and the emotion recognition rate is also improved.The above research work shows that although the two-stream convolutional neural network simultaneously utilizes the information of the spatial and temporal domains of the video,it ignores the connection between the temporal and spatial domains.By adding an attention mechanism between the two-stream networks,the information of the two domains is interacted to obtain emotional features that are more conducive to the recognition of dynamic expressions,which can effectively improve the recognition rate of dynamic expressions.
Keywords/Search Tags:Dynamic Expression Recognition, Two-stream Convolutional Neural Network, Transfer Learning, Attention Mechanism
PDF Full Text Request
Related items