Font Size: a A A

Research On Facial Expression Recognition Methods In Natural Scene

Posted on:2023-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q ZhaoFull Text:PDF
GTID:2568306758466004Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Facial expression is one of the most effective and universal ways for humans to express emotions and intentions,and facial expression recognition(FER)is an important foundation for emotion understanding and analysis,a prerequisite to help computers understand human emotions and convey their own emotions,and an effective way for humans to give emotions to machines and bridge the gap between machines and humans.As face expression recognition tasks gradually move from controlled laboratory environments to challenging natural scenes,the interference problem caused by expression-independent factors(e.g.,illumination,occlusion,and pose variations)in open-world environments seriously affects the accuracy of expression recognition.Moreover,due to the most face expression recognition models usually contain huge number of parameters and the high computational overheads,which is also unacceptable in natural scene applications.In addition,it is also crucial to learn robust spatialtemporal facial features in natural scenes for video expression analysis tasks.In this paper,we conduct an in-depth study to address these existing problems and propose the following methods:A global multi-scale and local attention network(MA-Net)is proposed for FER in the wild.Specifically,the proposed network consists of three main components: a feature pre-extractor,a multi-scale module,and a local attention module.The feature pre-extractor is utilized to preextract middle-level features,the multi-scale module to fuse features with different receptive fields,which reduces the susceptibility of deeper convolution towards occlusion and variant pose,while the local attention module can guide the network to focus on local salient features,which releases the interference of occlusion and non-frontal pose problems on FER in the wild.Extensive experiments demonstrate that the proposed MA-Net achieves the state-of-the-art results on four in-the-wild FER benchmarks.An efficiently robust FER network named Efficient Face is proposed,the proposed method holds much fewer parameters but more accurate and robust to the FER in the wild.Firstly,to improve the robustness of the lightweight network,a local-feature extractor and a channelspatial modulator are designed.As a result,the network is aware of local and global-salient facial features.Then,considering the fact that most emotions occur as combinations,mixtures,or compounds of the basic emotions,we introduce a simple but efficient label distribution learning(LDL)method as a novel training strategy.Experiments conducted on realistic occlusion and pose variation datasets demonstrate that the proposed Efficient Face is robust under occlusion and pose variation conditions.Moreover,the proposed method achieves stateof-the-art results on three benchmarks.A dynamic FER transformer(Former-DFER)is proposed for the in-the-wild scenario.Specifically,the proposed Former-DFER mainly consists of a convolutional spatial transformer(CS-Former)and a temporal transformer(T-Former).The CS-Former consists of five convolution blocks and three spatial encoders,which is designed to guide the network to learn occlusion and pose-robust facial features from the spatial perspective.And the temporal transformer consists of three temporal encoders,which is designed to allow the network to learn contextual facial features from the temporal perspective.The heatmaps of the leaned facial features demonstrate that the proposed Former-DFER is capable of handling the issues such as occlusion,non-frontal pose,and head motion.And the visualization of the feature distribution shows that the proposed method can learn more discriminative facial features.Moreover,our Former-DFER also achieves state-of-the-art results on the DFEW and AFEW benchmarks.
Keywords/Search Tags:Facial expression recognition, multi-scale, attention, Transformer
PDF Full Text Request
Related items