Font Size: a A A

Research On Dynamic Expression Recognition Based On 3D Spatial-temporal Convolution

Posted on:2021-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:K L ZhongFull Text:PDF
GTID:2518306308974469Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Facial Expression Recognition(FER)is a kind of technology that obtains expression information from facial images to judge emotion.As a kind of biometric recognition technology with strong usability,FER is widely used in many human-computer interaction scenarios,such as social robots,beauty scheme selection and driver fatigue monitoring.In recent years,the rapid de-velopment of artificial intelligence technology promotes the updating of expres-sion recognition scheme,and FER based on deep learning is widely concerned by academia and industry.As a research branch of FER,dynamic FER aims to mark the most appropriate label for the video sequence.How to establish the spatial-temporal connections between frames is a hot research topic.In this paper,3-Dimension Convolutional Networks(3D ConvNets)are used to con-struct spatial-temporal correlations between frames.In view of the two major problems in 3D ConvNets,that is,it is difficult to establish explicit temporal relations and prone to over fitting,this paper proposes corresponding solutions respectively,and carries out detailed experimental simulation.The main work of this paper includes two aspects,which are summarized as follows:1.Attention learning mechanisms are designed in 3D ConvNets to optimize the network structures.(1)Channel attention mechanism is designed to calculate the relations among different channels explicitly.These relations are then loaded into the original features to generate weighted features in which the outputs of impor-tant channels are strengthened,and the outputs of redundant channels are sup-pressed.We also design residual link and weighted connection to integrate the weighted features and original features.(2)Time attention mechanism is designed to calculate the explicit temporal relations between frames.The spatial information of other frames are loaded into the current frame by their relations,so the current frame can perceive the overall sequential information of video in advance.Time attention mechanism can expand the information perception field in time dimension and avoid using implicit relational reasoning completely in 3D ConvNets.2.Cross-dimension transfer learning is proposed to ease the problem of over-fitting in 3D ConvNets.(1)2D spatial models are expanded with an additional time dimension to generate 3D spatial-temporal models.The simulated 3D models are then loaded into 3D ConvNets to avoid of training from scratch.Cross-dimension transfer learning accelerates the convergence of training and easy the the problem of over-fitting in dynamic FER effectively.(2)Spatial Feature Retention Rate(SR-Rate)is proposed to describe the information correlation between simulated spatial-temporal features and orig-inal spatial features.Experiments show that the recognition performance of spatiotemporal network is positively related to SR-Rate.
Keywords/Search Tags:3D convolution, Dynamic facial expression recognition, Attention mechanism, Transfer learning
PDF Full Text Request
Related items