Font Size: a A A

Research And Implementation Of Key Technologies Of Fast Facial Expression Recognition In The Wild

Posted on:2021-02-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:P JiangFull Text:PDF
GTID:1488306311970999Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Facial expression recognition(FER)technology is an interdisciplinary product of many sub-jects such as psychology,biology and computer science.In the past decade,FER is always an important topic in the field of artificial intelligence because of its high use value and research significance.Due to the improvement of hardware and deep learning technologies,the focus of FER has transitioned from controlled laboratory environments to real-world conditions,and the application marketplace of FER has grown vigorously.Accordingly,the flourishing FER ap-plications require improving FER techniques,such as,higher recognition accuracy,less en-vironmental constraints,fast computing speed,small and portable,etc.In short,developing fast and portable FER techniques is not only the demand of harmonious human-computer interaction and artificial intelligence,but also the requirements of the relevant application marketplace.This dissertation aims to explore a miniaturized automatically FER system in natural scenes,and mainly focuses on some key technologies.Our major emphases and innovative points are as follows:1.A fast and accurate noise level estimation(NLE)method.Because of the inherent defects of image sensors,the digital images inevitably contains image noise,and the noise not on-ly leads the failure of subsequent face detection but also strongly decreases FER accuracy;therefore,fast and accurate noise level estimation is very necessary to determine whether the image quality needs denoising and provide the important parameters for denoising methods.Existing fast noise level estimators tend to overestimate over the low-lever noisy images,especially high frequency images(with abundant textures),and hence they are not suitable for various natural environments.This paper provides a noise-level estimator for additive white Gaussian noise using principal component analysis(PCA)and principal texture patch-es(PTPs).The major contributions toward addressing the challenges in the NLE literature include:1)analyzing for the idea of PCA-based NLE apporaches and the modeling of the nonlinear relationship between the smallest eigenvalue,true noise level,number of chosen patches,and block size;2)based on the principle that more samples lead to greater accuracy,we choose the most PTPs to compute noise level,therefore,the proposed method not only works well for various natural images over a large range of noise levels,but also is more reliable and accurate2.Fast and efficient facial expression recognition using a Gabor convolutional network(GC-N).Facial emotion changes are strongly related to certain regions of interest(ROIs),such as the mouth,eyes,eyebrows,and nose.In other words,facial expression features come from facial ROIs in essence.Gabor filters can characterize the spatial frequency structure and hence can efficiently capture the features of facial ROIs;thus,Gabor filters are widely used in the field of FER.The GCNs combine the advantages of both Gabor filters and traditional convolutional networks(CNNs),and hence are more proficient at extracting the features of facial ROIs.That is,GCNs can extract sufficient expression features with fewer convolu-tional units.Consequently,this dissertation presents a light GCN architecture for FER tasks,which has two major advantages:1)it has better recognition performance comparable to the state-of-the-art VGG16,ResNet-18,etc.;2)it requires very low computational complexity and spatial costs because it consists of only four Gabor convolutional layers and two linear layers.3.Accurate and reliable facial expression recognition using advanced Softmax loss(ASL)with fixed weights.In the FER literature,the most widely used loss function is softmax loss(SL).Under the supervision of SL,deeply learned features are likely to be separable in the angular space but not sufficiently discriminative.Hence,many deep embedding approaches have been presented to enhance the discriminative power of deep features by reducing in-traclass variation and enhancing interclass differences.Starting with the sum of the cosine values of all pairwise angles,we deduce that it is an ideal case that the learned features are uniformly distributed in the angular domain and the cosine value of every pairwise angle equals 1/(1-n)(n is the number of categories).Based on this idea,this paper presents an ASL to mitigate the bias induced by data imbalance and hence increase accuracy and relia-bility.The proposed ASL essentially magnifies the interclass diversity in the angular space to enhance discriminative power in every category.The proposed loss can easily be imple-mented in any deep network.Extensive experiments demonstrate that ASL is significantly more accurate and reliable than many state-of-the-art approaches and that it can easily be plugged into other methods and improves their performance.4.Based on the techniques presented by this dissertation,we designed an embedded in-the-wild FER system,and implement this system on an Nvidia Jetson AGX Xavier.The time to process a 640 × 480 RGB image is nearly 0.3 seconds(about 0.1 seconds for GPU),which is faster than some other FER systems.
Keywords/Search Tags:Facial Expression Recognition, Noise Level Estimation, Principal Component Analysis, Gabor Filter, Convolutional Neural Network, Gabor Convolutional Network, Dis-crimination of Feature
PDF Full Text Request
Related items