Font Size: a A A

Facial Motion Capture Based On Sparse Representation And Cascaded Regression

Posted on:2019-08-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q ZhangFull Text:PDF
GTID:1368330590472974Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Facial motion capture aims to capture the movement of facial landmarks(such as mouth corners,eye corners and brows),3D facial pose and the change of expression,then the captured information is used to reconstruct the 3D face and drive the performance of a 3D avatar.In recent years,with the development of computer vision and computer graphics,facial motion capture has become a focus of the current research in the field of computer animation.Facial motion capture also plays an important role in areas such as intelligent human-computer interaction,virtual reality and augmented reality.Facial motion capture from monocular videos is a very promising aspect in this field,and it also provides a more practical solution to public users.In this research we carry out an in-depth investigation on intelligent facial motion capture from monocular videos mainly focused on the regression and optimization problems.The key tasks in facial motion capture from monocular videos are face tracking,facial landmark localization,and the estimation of 3D pose and expression.Aiming to overcome deficiencies of current methods in complex scenes,we focus on robust face tracking under occlusion,regression problems in facial landmark localization across poses,robust regression from corrupted faces and multi-parameter regression for the estimation of 3D pose and expression.The main research content is as follows.1)To solve the partial occlusion and object disappearance problems suffered by existing face tracking algorithms,we propose a novel robust tracking method based on expanding templates and sparse representation.Expanding templates(fixed template and recent template)are combined with a sparse representation-based tracker to improve the tracking accuracy and robustness to occlusion.The experiments on several public datasets prove that the application of expending templates improves the ability of sparse representation-based tracker to deal with the occlusion problem,while keeping the original algorithm's advantage.2)In order to improve the performance of multi-pose facial landmark localization in the wild,we propose a sign-correlation supervision descent method,which is based on nonlinear optimization theory.Unlike previous methods that train only one global model on the multi-pose dataset,we analyze the sign correlation between features and shapes,and project both of them into a mutual sign-correlation subspace,which is highly related to pose variation.Thus samples in each hyper-octant of this subspace hold the condition that one general descent exists,and the whole multi-pose samples are partitioned into a series of pose-consistent subsets.Then a group of models are learned from each subset.Our sign-correlation partition method is validated in the public multi-pose datasets.It indicates that our method can separate samples into pose-consistent subsets with much accuracy,which reveals its latent relationship to pose.The comparison with state-of-the-art methods demonstrates that our method outperforms them especially in uncontrolled conditions with various poses,while keeping comparable speed.3)Outliers and noises caused by partial occlusion and extreme illumination often lead to failures in facial landmark localization across poses.To address this issue,we propose a novel robust regression based on low-rank-sparse subspace representation.Unlike most of the existing robust methods,we propose an approach with two phases of subspace recovery and regression optimization being carried out simultaneously.Compared with the current low-rank constrained robust regression method for single pose or single expression scenes,our low-rank-sparse subspace representation is more compliant with the modeling of the corrupted faces in multi-pose and multi-expression scenes.We also apply the linearized alternating direction method with adaptive penalty to optimize the model,convergence proof and complexity analysis given.The experiment on synthetic data verifies the convergence of this method and the ability of regression in multiple subspaces.Experiments on benchmark datasets against several state-of-the-art robust methods indicate that our method improves the accuracy of regression from corrupted faces,and reduces the error of face recovery and facial landmark localization across poses.4)Existing regression methods for the estimation of 3D pose and expression suffer from large out-of-plane variations of pose and expression.To address this issue,we propose an efficient and robust method by introducing a novel supervised coordinate descent method with a 3D bilinear model.Instead of learning the mapping between the whole parameters and image features directly with a cascaded regression framework in current methods,we learn individual sets of parameters mappings separately step by step with a coordinate descent mean.Since different parameters make different contributions to the displacement of facial landmarks,learning each kind of parameters separately reduces the cross-effects brought by the overall learning.The experiment on challenging online videos demonstrates that it can achieve more reliable 3D face reconstruction result,and its 2D facial landmark error is smaller.Compared with current regression-based methods,the proposed method performs better in the case of the head pose variation and extreme expression out of plane.Finally all proposed methods are integrated into a robust facial motion capture system,which can capture facial motion stably in complex scenes including various poses,extreme expression,partial occlusion and illumination variation.
Keywords/Search Tags:Face tracking, facial landmark localization, 3D face reconstruction, sparse representation, cascaded regression
PDF Full Text Request
Related items