Font Size: a A A

Joint Patch And Multi-label Learning Of Facial Action Unit Detection

Posted on:2017-08-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:K L ZhaoFull Text:PDF
GTID:1318330518996011Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The face is one of the most powerful channels of non-verbal commu-nication. Thirty anatomically-based facial actions (referred to as action units) along with additional descriptors can occur alone or in thousands of combinations to produce nearly all possible facial expressions. Most exist-ing methods to detect action units (AUs) use one-vs-all classifiers without considering dependencies that exist between AUs. Exploiting such de-pendencies Ccould reduce the number of model parameters and improve generalization. To address these issues, we introduce a Joint Patch and Multi-label Learning (JPNML) framework that leverages group sparsity to identify useful facial patches for each AU while utilizing relations between AUs to learn multi-label classifiers. Dependencies between AUs were de-termined by statistically analyzing more than 350,000 frames. To the best of our knowledge, this is the first work that jointly addresses patch- and multi-label learning for AU detection. In addition, we show that JPML can be extended to recognize holistic expressions by learning common and specific patches, which afford a more compact representation than stan-dard patch learning methods. In four of five comparisons on three diverse datasets, JPML produced the highest average F1 scores in comparison with state-of-the-art.Knowing that AUs are active on sparse facial regions, RL aims to identify these regions for a better specificity. On the other hand, a strong statistical evidence of AU correlations suggests that ML is a natural way to model the detection task. In this paper, we propose Deep Region and Multi-label Learning (DRML), a unified deep network that simultane-ously addresses these two problems. One crucial aspect in DRML is a novel region layer that uses feed-forward functions to induce important facial regions, forcing the learned weights to capture structural informa-tion of the face. Our region layer serves as an alternative design between locally connected layers (i.e., confined kernels to individual pixels) and con-ventional convolution layers (i.e., shared kernels across an entire image).Unlike previous studies that solve RL and ML alternately, DRML by con-struction addresses both problems, allowing the two seemingly irrelevant problems to interact more directly. The complete network is end-to-end trainable, and automatically learns representations robust to variations inherent within a local region. Experiments on BP4D and DISFA bench-marks show that DRML performs the highest average F1-score and AUC within and across datasets in comparison with alternative methods. To summarize, this paper presents three contributions:To leverage local dependencies between features and AU occurrences,we present a Joint Patch and Multi-label Learning (JPML) framework that simultaneously selects a sparse subset of facial patches and learns a multi-label AU classifier. To the best of our knowledge, this is the first effort to jointly address?patch and multi-label learning for AU detection.·To learn dependencies between AUs, we statistically analyzed over 350,000 labeled samples, and categorized them into positive correla-tion (likely co-occurrences) and negative correlation (rare co-occurrences).The findings are consistent with existing literature, including the FACS manual.·We develop an extension of JPML that disentangles common and spe-cific patches to recognize holistic facial expressions. In experimental tests, this extension of JPML reveals patches shared among and spe-cific to particular expressions and achieves accuracy comparable to or exceeding alternative approaches.· We introduce a new region layer that serves as an alternative design between locally connected layers (i.e., confined kernels to individual pixels) and conventional convolution layers (i.e., shared kernels across an entire image).·The final network is end-to-end trainable, and converges faster with better learned AU relations than alternative models.·To validate the generality of patch/region and multi-label learning for the other applications in computer vision, we conducted experiments on multi-label expression analysis, action recognition, and scene clas-sification.
Keywords/Search Tags:Facial Action Units, facial expression analysis, group sparsity, patch learning, region learning, multi-label learning, convolutional neural networks
PDF Full Text Request
Related items