| Video facial expression recognition is an important branch in the field of computer vision and human-computer interaction.Facial expressions are the true expression of human emotions.Studies on facial expressions can enable machines to better understand human emotions and better communicate with people.In previous studies,video facial expression recognition generally adopts the conventional image algorithms,such as optical flow method,to extract features of facial expression,and then send the features to the classifier for training.This method is highly dependent on the quality of images and extracted features.In addition,the expression images generated spontaneously by humans in real scenes are easily disturbs by light and postures,which will greatly reduce the robustness of conventional image algorithms.With the progress of deep learning methods in image classification and video recognition,more and more attentions are paid to video facial expression recognition based on deep learning.In recent years,convolutional neural network and recurrent neural network are commonly used to extract facial expression features from video images.The extracted features are sent to classifier for classification to build an end-to-end video facial expression recognition model.This paper mainly focused on the research of video facial expression recognition based on deep neural network.Firstly,the theoretical knowledge of deep neural network was elaborated in detail.Secondly,the main processes of video facial expression recognition utilizing deep learning method are summarized.According to the characteristics of video expression recognition,the structure of basic convolutional neural network as well as the problems in model pre-training and image pre-processing were deeply explored.For the discrete frames,an end-to-end model based on the second-order information extracted by bilinear CNN method was proposed.For dynamic frames,an extraction method combining inter-frame information and the global information of non-local network was suggested.Finally,we used a depth model compression algorithm to compress the model size,and developed a real-time video facial expression recognition demo system on mobile.The innovation of the paper was as follows: 1.We proposed an improved face alignment algorithm.2.We proposed an improved structure of convolutional neural network based on VGGNet.3.Using transfer learning,we pre-trained our model on large-scale expression datasets.4.Based on the fine-grained classification,we extracted the second-order features of expressions by bilinear operation for classification.5.We proposed a method combining gated loop unit with non-local network to extract local and global features of expressions and integrate temporal informations,so as to classify dynamic frame expression images.6.We used the model pruning algorithm to compress the model and developed a real-time demo system of video facial expression recognition. |