Font Size: a A A

Facial Expression Recognition Based On Spatial-temporal Representation And Deep Learning Model

Posted on:2019-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:S L XuFull Text:PDF
GTID:2428330566980052Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous improvement of computer technology and the rapid development of artificial intelligence,facial expression recognition technology has gradually become a research hotspot.Facial expression recognition is the use of modern computer technology to analyze specific facial expression change,and then determine the people's mental state,so as to achieve more humanized and intelligent interaction between human and machine.Computer facial expression recognition for promoting the application and development of artificial intelligence technology,enhancing the intelligence of the computer,developing the new man-machine environment,and promoting the development of psychology and other disciplines,have important practical significance,and ultimately produce huge social and economic benefits.In recent years,deep learning algorithm has sprung up,and it has brought new opportunities to all fields with the rapid development speed.Unlike traditional manual feature extraction methods,researchers can automatically learn generalized features by constructing deep neural network.Therefore,in view of the particularity of facial expression recognition,this dissertation applies the deep learning model to facial expression recognition.This dissertation studies the facial expression recognition based on the dynamic image sequences.In order to capture static and dynamic expression information of facial images at the same time,we build the Pyramid CNN model in an innovative way,and combine the acquired deep features with spatial-temporal LBP-TOP to form the final formulation of expression.Firstly,in the facial expression sequence,the frame with the maximum expression intensity which called Apex frame is selected by calculating the total displacement of the landmarks of the face.Because of the subtle asymmetry of facial expression on the left and right side,and in order to capture the global and local features of the Apex frame,we construct the Pyramid CNN model,respectively for the whole Apex face image and block the local area,and then the final pyramid CNN feature will be expressed by cascading the five deep features.For the expression sequence,it not only needs to effectively extract the spatial information of the expression image,but also needs to model the change process of the facial expression.Therefore,in this dissertation,we use LBP-TOP operator to extract dynamic texture features of expression from three orthogonal planes,which express the essence information of facial expression better.Finally,we combine static Pyramid CNN features and dynamic LBP-TOP features into ‘one vs one' SVM multiple classifiers to implement facial expression classification.In addition,in order to get the correlation of expression sequence in time domain,we introduce the LSTM structure,and construct an end-to-end deep network model based on convolution neural network and Long Short-Term Memory network.The network uses pretrained CNN model to extract the spatial features of each frame in video sequence,and then the output will be fed into the LSTM network to obtain dynamic information between frames.Finally,the average value of the output of each LSTM unit which is viewed as the video sequence representation will be fed into a classification layer and output the expression label.In order to verify the effectiveness of the proposed algorithm,we conduct experiments on two standard data sets of CK+ and Oulu-CASIA respectively.The experimental results show that the two algorithms we proposed have great performance of facial expression recognition.
Keywords/Search Tags:Facial Expression Recognition, Convolutional Neural Network, Long Short-Term Memory, Support Vector Machine
PDF Full Text Request
Related items