Font Size: a A A

Research On Deep Learning Based Facial Expression Recognition In The Wild

Posted on:2019-07-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W YanFull Text:PDF
GTID:1368330590460146Subject:biomedical engineering
Abstract/Summary:PDF Full Text Request
Facial expression recognition in the wild(FERW)aims at classifying facial expressions in the wild into different categories automatically,so that computers can sense the emotional states and inner feelings of one person.FERW is a hot research topic in affective computing nowadays.It is crucial in both theoretical research and a wide variety of applications in human-computer interface,medical health,education and entertainment.In real-life environment the illumination conditions,facial poses,object occlusion and background noise are all out of control and traditional methods cannot perform well enough.Therefore FERW becomes a challenging task for researchers in the community.In recent years,deep learning based methods have shown amazing performances on various tasks in computer vision domain.These methods smash the records of traditional approaches and become very popular in academia and industry.Deep learning methods automatically learn and explore discriminative features which are closely related to the tasks by constructing complex deep neural networks,so that they can improve the performance by leaps and bounds.For the task of FERW,we employ this technology to learn more discriminative features from low-level pixels to high-level emotional semantics so as to achieve better performance.Focusing on this,the paper conducts these works as follows.(1)We propose FERW based on model transferring of deep convolutional neural networks(CNNs).In facial expression recognition,training CNNs from scratch requires a large amount of data.Nowadays sample numbers of FERW databases are very limited and training directly can easily lead to overfitting problem.In chapter 2,we utilize the already trained deep CNN model weights on other classification tasks to fine-tune the last several layers of CNNs while keep the weights of rest layers fixed based on the idea of transfer learning,so that the networks can learn discriminative high-level semantics features of facial expression with limited training samples.(2)We propose a joint convolutional bidirectional long short-term memory model.For FERW based on static images,ordinary CNNs can only learn facial texture appearance features.In order to compensate this drawback and take advantage of spatial relations of different facial regions to further describe facial features,we propose a joint convolutional bidirectional LSTM model in chapter 3.The convolutional layers in the front learn texture features of facial regions.LSTM is employed to model spatial relationship in two directions of facial texure features separately.Finally,feature representations of both spatial relation and deep facial texture are concatenated together for classification.(3)We propose two-stream convnets with shared attention.In order to enhance the discriminability of spatial-temporal feature of facial texture changes learned by two-stream convnets,we introduce attention mechanism from the aspects of convolutional inputs and concatenated feature maps,and propose two-stream convnets with shared attention in chapter 4.In this model,video data is divided into static image frame and dense optical flow sequence from the perspective of spatial and temporal.We employ the same CNN architecture to learn the staic and dynamic features for both aspects.Meanwhile,based on exponentially enhanced convolutional input weight and soft attention module,the network can increase or decrease the attention value of corresponding regions through self-learning,and thus focus on regions which are more related to facial expression categories while information of irrelevant regions is suppressed.Therefore the model can obtain more discriminative spatial-temporal feature of facial texture changes.(4)We propose a multi-cue fusion emotion recognition method.The previous three works focus on how to learn better facial texture feature representation.Besides that,there are other related cues which are worthy exploring in the complex wild environment.Therefore we propose a multi-cue fusion emotion recognition method in chapter 5.In this method,besides the facial texture changes,we take the facial landmark trajectories and audio modality into consideration.We use the cascaded CNN-BRNN to model the facial texture changes and two independent CNNs to learn the emotional patterns in landmark trajectories and low-level acoustics features.Finally we fuse the results from three cues together in decision level.These works above indicate that by designing proper deep network architectures based on different characteristics of facial expression images and videos in the wild,they can explore expression related feature representations effectively from the input data and improve the model performance in wild conditions.
Keywords/Search Tags:Facial Expression Recognition in the Wild, Deep Learning, Convolutional Neural Networks, Attention Mechanism, Multi-Cue Fusion
PDF Full Text Request
Related items