| Head pose estimation,that is,determining the pose of the human head relative to the camera view direction,is an important branch of computer vision based on human biometrics.Head posture can be applied to scenarios such as fatigue driving detection,smart classroom,and human-computer interaction.In addition,the task of head pose estimation is also the basic task of many face-related tasks,such as face recognition,line of sight estimation,etc.Therefore,the task of head pose estimation has important research value.This paper combines RGB images and depth images,and uses convolutional neural networks to study and estimate head poses.The main work includes the following aspects:(1)A head pose estimation method based on multi-task and attention mechanism is proposed.The algorithm predicts head pose using a single RGB image.First of all,different from the conventional multi-task algorithm,in this algorithm,the yaw angle(yaw),pitch angle(pitch),and roll angle(roll)representing the head posture are separated and predicted as three sub-tasks,respectively.It is used to learn the "personalized" features of each angle;then an angle classification task is designed to guide the regression of each angle;then a convolutional neural network with 4prediction branches is constructed with GhostNet as the backbone network;and for The attention mechanism module CBAM is introduced in the four task branches to strengthen useful features and weaken useless features;finally,by introducing offsets,the calculation method of the past head pose angle value is improved.The algorithm is compared in 300W-LP,AFLW2000,and BIWI data sets.The experimental results show that the average prediction error of the algorithm is controlled below5°,and compared with similar algorithms,it has achieved good results.(2)A head pose estimation method based on depth image multi-level feature fusion is proposed.Different from the method of building a 3D model from the depth image to predict the head pose,this method directly uses the depth map to predict the head pose.First,the method uses ResNet101 as the backbone network,which has powerful feature extraction capabilities;then,in order to fully utilize the local spatial information and global spatial information of the depth map,a multi-level feature fusion module is constructed;In the head pose estimation method of the force mechanism,the three angles are regarded as a task respectively,and a convolutional neural network with three prediction branches is designed.The algorithm is compared in the BIWI and Pandora data sets.The experimental results show that the average prediction errors on the BIWI and Pandora data sets are controlled at about 2° and 6°,respectively.Compared with similar algorithms,the effect is good.(3)A head pose estimation algorithm based on progressive depth fusion of multimodal information is proposed.The method combines RGB images with depth images to predict head pose.First,the method imitates the backbone network with heterogeneous dual-stream structure in FSANet,and constructs a homogeneous dual-stream backbone network;then in order to realize the progressive and deep fusion of multi-modal information,an information interaction mechanism between tributaries is proposed.In the extraction stage,the multimodal information is initially fused;then,a feature fusion module that introduces channel shuffling and attention mechanism is proposed to deeply fuse the feature maps output by the feature extraction network.The algorithm was compared in the BIWI data set,and the average prediction error on the BIWI data set was controlled below 2°,which achieved good results compared with similar algorithms and the first two algorithms in this paper. |