Font Size: a A A

Expression Recognition Based On Adaptive Fusion Of Local And Global Features In Real Scenes

Posted on:2023-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:F P ZhangFull Text:PDF
GTID:2568306836472114Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Expression is an important channel for transmitting human emotions.With the rapid development of computer technology and the application of deep learning technology,more and more researchers have begun to use deep learning related technologies for facial expression recognition.In recent years,the collection of a series of facial expression datasets have promoted the rapid development of this field.According to the collection scenarios,the facial expression datasets can be divided into two types: one is the dataset under controlled laboratory conditions,and the other is the expression dataset under real scenes.The dataset under real scenes has attracted much attention in recent research work.Such datasets often come from network crawling,with a large amount of data collective,and are affected by many factors such as occlusion and pose variations.Facial occlusion and pose variations are two major challenges in facial expression recognition tasks in real scenes.This thesis focuses on the problems of facial occlusion and pose variations in facial expression recognition tasks in real scenes.The main contents are as follows:(1)Aiming at the problems of facial occlusion and pose variations in expression recognition in real scenes,an expression recognition method based on adaptive fusion of local and global features is proposed.The method firstly constructs an expression recognition model,which includes a region cropping module,a feature extraction module,a feature fusion module and a classification layer.The feature extraction module extracts the features of the whole face image and its multiple local region images.The feature fusion module learns the attention weights of the above multiple image features through the attention mechanism,and adaptively selects important features based on the weights for weighted fusion.In the feature fusion process,the low-weight image features affected by occlusion or pose variations factors will be discarded,thereby suppressing and eliminating the adverse effects of these low-weight image features on expression recognition,and then using the training samples in the expression dataset to train the constructed expression recognition model.When training the model,by adding an attention weight constraint to the loss function,the expression recognition model is forced to pay more attention to the partial face image that is more discriminative than the overall face image,finally,the performance of the model is verified by test samples from the expression dataset.The experimental results show that the accuracy of the model on the FERPlus dataset and the RAF-DB dataset is 3.96% and 4.11% higher than the baseline,reaching 89.10% and 87.19%,respectively.(2)Aiming at the problem of information redundancy in local region images,a hybrid domain attention mechanism module is proposed.This module firstly introduces the spatial domain attention mechanism,which emphasizes some important spatial features according to the trained spatial domain attention weight,and then introduces the channel domain attention mechanism,which emphasizes some important channel features according to the trained channel domain attention weights,and finally fuse the spatial domain attention mechanism with the channel domain attention mechanism,and extract important features from the two dimensions of the spatial domain and the channel domain at the same time.At the same time,soft pooling is used instead of maximum pooling and average pooling in the mixed-domain attention module,thereby avoiding the problem of missing features,local distortion,and the average contribution of each value of average pooling.By using the hybrid domain attention module,the model can effectively suppress irrelevant features in local region images,which is beneficial for classifying expression images in real scenes.The experimental results show that the accuracy of the model on the FERPlus dataset and the RAF-DB dataset is 3.96% and 4.11% higher than the baseline,reaching 89.10% and 87.19%,respectively.(3)In order to explore the performance of the proposed expression recognition model on samples with occlusion and large pose variations,the occlusion test set and the pose variations test set were obtained by filtering on the original test sets of FERPlus and RAF-DB.The occlusion test set contains facial expression images such as glasses occlusion and scarf occlusion,and the pose variations data set mainly contains facial expression images such as looking up,bowing,and head deflection.In the experimental part of the paper,the proposed LGAF model and the mixed domain attention module both use the above test set for ablation experiments It shows that under the condition of face occlusion and pose variations,each module of the model has beneficial effects.The accuracy of the final model in the FERPlus occlusion test set and the pose variations test set was increased by 11.15% and 5.56% compared to the baseline,reaching 84.17% and 80.72%,respectively.The accuracy of the RAF-DB occlusion test set and the pose variations test set Compared with the baseline,the rate increased by 5.20% and 4.00%,reaching 83.48% and 85.80%,respectively.
Keywords/Search Tags:Expression recognition, Attention mechanism, Feature fusion strategy, Local occlusion, Pose variations
PDF Full Text Request
Related items