Font Size: a A A

Research On Deep Learning For Facial Expression Recognition

Posted on:2023-10-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y TongFull Text:PDF
GTID:1528306914476484Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Facial Expression Recognition(FER)aims to understand the underlying human emotions and establish effective communication between humans and computers.Facial expression recognition(FER)has attracted great attention in the research field due to its applications in various fields,such as human-computer interaction,medical diagnosis and so on.Traditional FER methods rely on hand-crafted features that are then sent to a classifier for classification.However,due to the complexity and variability of the emotional features of facial expressions,the traditional methods suffer from inadequate feature extraction and susceptibility to external environment on real-word datasets.In recent years,many researchers have proposed end-to-end networks for FER based on deep learning.However,face expression images in the real world contain a large number of interferences such as local occlusions,pose changes,complex illumination and individual differences,which make FER still have many problems to be solved.In this paper,based on the feature extraction method of deep learning,we propose three FER depth models with the goal of improving the accuracy and robustness of FER.The main innovations of this paper are as follows:(1)Data augmentation and second-order pooling model for FER.Facial expression data is insufficient due to the difficulty of labeling,and there is a lack of large number of reliable labeled samples.Deep neural networks generally learn local features through multiple convolution and pooling,and finally obtain an image-level representation through global average pooling,which is then sent to the classifier for classification.Global average pooling only computes the mean of features,ignoring the relevant information.Therefore,we propose data augmentation and covariance pooling models,where data augmentation increases the sample data by online data augmentation such as resizing,random flipping,random cropping and random erasing,and covariance pooling is proposed to fully utilize the correlation information between features after depth feature extraction,and the Newton iteration method is used to solve the square root of the matrix,which is used for fast end-to-end training.The experimental results show that data augmentation can effectively solve the network overfitting problem,and the second-order features extracted by covariance pooling outperform the first-order features,which improves the accuracy of FER.(2)Adaptive weight based on overlapping blocks network for FER.A deep neural network extracts global features from face images,but it is not suitable for face occlusion and face pose recognition.The local features can solve this problem better.Traditional local feature extraction methods are mainly divided into two categories.One is to rely on the accurate localization of the key points,but it is associated with additional computational cost.The other is to block the original image,but the location is not accurate enough.Therefore,in order to better focus on the local features of the facial expression,an adaptive weight and feature mapblocking model is proposed.On the feature map,each pixel is obtained from a local region in the original image by a nonlinear transformation.The blocking of the feature map helps to obtain more discriminative information and greater robustness to local noise.Considering the correlation and importance between different blocks,an adaptive weights module is designed to model the blocked features,to strengthen the effective features and suppress the invalid features.The experimental results show that this model can further improves the FER accuracy,and focus on small image regions,effectively alleviate the problems of local occlusion and facial pose.(3)Disentangled region non-local neural network for FER.Existing global feature extraction methods for FER all rely on convolutional stacking and pooling to expand the perceptual field for FER,ignoring the interaction between locations.In this work,we propose Disentangled region non-local neural network models that directly capture distant dependencies by computing interactions between locations and regions without restricting to neighboring points,thus preserving more information.Meanwhile,we extract clearer visual features by disentangling the region non-local network into two terms,where one term is whitened to a pairwise term that models the connection between locations and region blocks,and another term is an independent term that represents the saliency information of each pixel.Through extensive experiments and visualizations,it is shown that this model can effectively use the connection between pixels and regions for modeling,and the disentangled design explores the effectiveness of various visual features.The proposed network can make better use of spatial distribution information,provides more effective global features,and further improves the accuracy of FER.
Keywords/Search Tags:facial expression recognition, deep learning, feature representation, data augmentation, non-local neural network
PDF Full Text Request
Related items