Font Size: a A A

Video Based Facial Expression Recognition

Posted on:2018-01-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:L P XieFull Text:PDF
GTID:1318330542451404Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Facial expression recognition (FER) is to determine which category an unknown image/video of one facial expression belongs to, based on some predefined image/video categories and some giving training samples. With the rapid development of artificial intelligence technology, computer science and other related subjects in 21st century, human-computer interaction has been taken more and more attentions. We hope that the computer or robot can not only speak, hear and watch, but also identify and express the internal feelings and emotions of humans. This exchange of the barrier free perfectly can serve humans better. As for the emotional disclosure, famous psychologist Mehrabian believes that verbal cues provide 7 percent of the meaning of the message; vocal cues, 38 percent; and facial expressions, 55 percent. This fully demonstrates the importance of facial expressions in the course of communication. Nowadays, FER has become an active topic in pattern recognition and machine vision community. More and more researchers have focused on this domain. Being able to identify human feelings will make the computers have wide applications in many fields,such as security, education, neurology, law, communication techniques and so on.Depending on the difference of objects, the study of FER can be classified as image-based and video-based methods. The study of image-based methods have made rapid progress in last several decades owing to its simplicity,rapidity and convenience. It has got a good recognition performance in certain circumstances.However, the information contained in the image is very limited. And the external circumstances and individ-ual differences will have great influence on the recognition performance. In addition, with the development of computer technology, researchers pay more attention to the study of video-based (video sequences) FER.The developing of a facial expression is a complex temporal and spatial process. The expression of a video can reflect a process of expression variations, contain both static and dynamic information of the face. Thus,there are great realistic meaning and practical value of video-based FER.In this thesis, we present a series of approaches to study the video-based FER to finally improve the accuracy and timeliness of recognition. For a given video of some expression, the classification is usually after the extraction of the feature of the whole video. Most of the current research comply with this processes,including our content of the previous four chapters. However, the timely detection and reaction is a basic process for a robot to communicate with humans. If the robot can only detect a facial expression after it finishes, the reaction will be much delayed, thereby inducing bad user experience. We thus study the early facial expression detection (EFD), which is a relatively new and challenging problem. The goal of EFD is to detect the facial expression as early as possible, the recognition result is checked and corrected as the number of frames increases. The contributions of the thesis can be summarized from the following perspectives:For the video sequences of some facial expression, we propose a new feature extraction algorithm named local Gabor binary patterns from three orthogonal planes (LGBP-TOP). Firstly, based on wavelet trans-form method, the gray-scale images are decomposed to different sub-images which show the different frequency characteristics of the edge features. The requirements of multi-scale and multi-resolution of the edge features can be met by this method. Secondly, LBP-TOP descriptor has the invariance of rotation,translation, scaling, etc. Thus, LGBP-TOP both embraces the attributes of multi-scale and multi-direction,and the characteristics of the sensitivity refers to small changes of bright spot and edge.Using only one type of feature to describe facial expression in video sequences is often inadequate, be-cause the information available is very complex. Considering that the dimensionality of these features is usually high, we thus introduce multi-view dimension reduction (MVDR) into video-based FER. The key issue of MVDR is to remove redundant and irrelevant information for all views simultaneously. Thus it is essential to exploit the feature correlation information contained in and between different views, and re-spect the view diversity. Inspired by structured sparse learning, we propose two novel MVDR frameworks named multi-view exclusive unsupervised dimension reduction and joint structured sparsity regularized MVDR.In recent years, the study of artificial neural network has been the focus of attention. We thus look into the application of neural networks in the classification of FER. Based on the structure decomposition method,which simplifies the network further, we propose two novel algorithms by combining Skeletonization (S-DBSkeletonization) and Cascade-Correlation (SDBCC) neural networks. We first decompose the complex six-class FER problem into six one-output problems, each of which can be regarded as an individual prob-lem. After learning all these individual problems in parallel with Skeletonization or Cascade-Correlation algorithm, we then integrate the six subnets to final decision. Compared to the original six-class problem,the complexity of the task each subnet needs to accomplish is decreased, which leads to a more compact network and obtains better recognition performance.There is few works on EFD has been proposed. MMED is the most popular model for early event detec-tion, and achieves competitive performance in EFD. However, it lacks flexibility to extract the discrim-inative information for training. Besides, the training with the large number of constraints is quite slow.To overcome these defects, we first propose multi-instance learning based early facial detector and then extend it to the online setting. The memory consumption is thus greatly decreased and the training time is significantly reduced.
Keywords/Search Tags:facial expression recognition, video-based, multi-view dimension reduction, neural networks, early facial expression detection
PDF Full Text Request
Related items