Font Size: a A A

Head Pose Estimation Based On Attention Supervision

Posted on:2021-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y K GeFull Text:PDF
GTID:2518306308968659Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Head pose estimation plays an important role in the field of computer vision.It has a wide range of applications in our lives,such as aiding in gaze estimation,human-computer interaction,fatigue driving detection and so on.So,it has important practical significance.However,annotating the pose of face is difficult,there are few training samples available for fine-grained head pose estimation,and the accuracy and generalization performance of the model cannot be guaranteed.In addition,most head pose estimation methods learn the 3D pose parameters of the face from the 2D face images directly,which lack the 3D supervised information.The accuracy of these methods needs to be improved.In this paper,an end-to-end fine-grained head pose estimation method named HopeNet,an advanced method,is used as a baseline model.Based on it,two improved methods are proposed for the problem of lack of training samples and lack of 3D supervision information:1)In order to increase the amount and diversity of training data,we propose a new method named 3DAug-HopeNet.It is different from traditional data augmentation methods such as flipping,scaling,and cropping.Instead,it uses 3D face reconstruction based on the 3DMM to generate the profiling face images with fine-grained pose annotations.This is a general method for generating multi-pose face images,and it can also be used in other head pose estimation methods.More training data can reduce the mean absolute error of head pose estimation and improve the generalization of the model.Compared with HopeNet,the average absolute error of the method on public datasets AFLW2000 and BIWI are reduced by 9.1%and 4.4%respectively.2)In order to introduce 3D supervision information,we propose the AT-HopeNet method,which introduce attention based on the face depth image in the hidden layer of the model to guide the training process.We use the face depth image as the attention map,it can not only introduce 3D supervisory information,but also introduce spatial attention,which can guide the network to focus on important areas and ignore irrelevant areas.Traditional spatial and channel attention methods usually learn attention information from feature maps,and this method uses the face depth image to enhance the network's attention to the target area directly,which is closer to the human attention mechanism than the traditional attention method.compared to HopeNet,the average absolute error of the method on public datasets AFLW2000 and BIWI are reduced by 15.1% and 15.0% respectively.
Keywords/Search Tags:deep learning, fine-grained head pose estimation, 3D face reconstruction, attention
PDF Full Text Request
Related items