Font Size: a A A

Research On Facial Expression Recognition In Video

Posted on:2020-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:T C WangFull Text:PDF
GTID:2428330578964133Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The crucial issue addressed in this paper is the recognition of facial expressions in video or image sequences.Compared with static images,video or image sequence contains spatial information but also sequential logic information.How to use these to improve the performance of this task is the basis of this paper.Classical statistical methods have certain limitations in processing temporal information in video or image sequence.In recent years,with the development of perceptual neural networks,how to use convolutional neural networks to extract temporal information is still an open question.This essay mainly explores the following four aspects:1.Facial expression recognition based on significant changes in geometry.Both geometric and appearance features play an important role in expression recognition.A similar good effect can not be achieved in a video or image sequence in which a static image based on geometric features determined by the marked points or appearance feature blocks determined by experience.The geometric features selected according to the saliency of the marked points movement between frames describe the process of the facial organs deformation.Constructed key blocks according to landmarks and extracted appearance features is more effective.The proposed method is verified in CK+ dataset and compared with some classical and state-of-the-art methods to proof the effectiveness.2.Facial expression recognition based on PLBP-Inception-LSTM.Under the dependency between different inputs,increasing the diversity of convolutional neural network input without changing the basic structure of the network can effectively improve the performance.Since the convolution operation has many similarities with the calculation of LBP,a LBP-like module is proposed and integrates it into the Inception network.Since the expression is formed by the deformation of the facial organs,and the facial organs are diverse in scale,the multi-branch Inception network is used as the basic structure to solve the scale problem.On the other hand,the temporal logic information in the video or image sequence is not well learned by the convolutional neural network,and the LSTM unit is used to process the temporal information to enhance the performance.Experiments were carried out on four public datasets,and compared with some classical and state-of-the-art methods,the effectiveness of the proposed method was verified.3.Facial expression recognition based on binary 3D convolutional network.In addition to using an arithmetic unit that specializes in processing temporal logic information,it is possible to obtain time-space domain information by extending a 2D convolution kernel into a 3D,which is also good in dealing with video or image sequence classification problems.However,due to its high overhead,which limits the application in many scenarios.Through the innovation and transformation of the network structure,a new local binary three-dimensional convolutional network is proposed and mathematically proved that the proposed model approaches the standard 3D convolutional neural network with a high probability under certain conditions.The verification test was carried out on the CK+ dataset,and the experimental results supported the above mathematical proof.4.Facial expression based on multi-modal ensemble.Compared to 2D images,the video also contains a second modality,audio information,which contains foreground,background,and mixed sounds.In this paper,the audio in the video is extracted and the foreground background sound separation is performed,and the features are extracted separately.In addition,the foreground sound features and the background sound features are cascaded to obtain new features.The above four features are respectively classified into SVM on the AFEW 6.0 dataset,and the experimental results show that the audio features are effective for expression classification.Multi-model ensemble is widely used to improve the overall performance.This paper also explores a variety of ensemble methods and combination tricks to improve overall performance.The effectiveness of the method was verified by experiments on four public datasets as well as comparison with some state-of-the-art methods.
Keywords/Search Tags:Expression Recognition, Video, Image Sequence, Geometric Significant Change, Convolutional Neural Network, Long-short Term Memory Network, Model Fusion
PDF Full Text Request
Related items