Font Size: a A A

Research And Application Of Head Pose Estimation Based On Label Distribution

Posted on:2020-06-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:L H XuFull Text:PDF
GTID:1368330578476508Subject:Education IT
Abstract/Summary:PDF Full Text Request
The development of information technology is injecting new vitality into the reform and innovation of education and provides a novel path for education modernization.The informatization teaching environment has been basically established meanwhile information technology represented by multimedia and network has been used extensively,enriching the forms of teaching content and teaching activities.However,the teaching mode is mostly confined to the traditional mode of"transmission-acceptance",in which the dominant status of students cannot be reflected,harming their individualized development.Therefore,to fully reflect the dominant status of students,it is necessary to grasp the students' learning state or interest in the teaching process.Attention can objectively reflect students' learning state or interest and is essential to realize individualized teaching.Head pose represents an individual's head orientation,which reflects the direction of attention to a large extent.To grasp the attention of students,this dissertation,therefore,studies the head pose estimation algorithms.Head pose estimation is to estimate people's head orientations in digital images using computer vision and pattern recognition technology,and has attracted more and more attention from researchers due to its wide spectrum of applications.Though the performance has been improved significantly,it is still unable to achieve expected performance in practical applications.The key factors that affect the performance of head pose estimation algorithms include:(1)the accuracy of labels.The rationality and truth of labels is the premise of algorithm verification;(2)the validity of features.Enhancing effective features and reducing interference factors are the key to improve performance;(3)the generalization of methods.The generalization ability is the guarantee of the practicability of the algorithms.Specifically,the corresponding challenges are:(1)the difficulty of data annotation.More accurate head pose angles cannot be obtained in natural scenes;(2)the interference from facial identity information.The apparent identity similarity of the same individual obviously overwhelms the similarity of the same head pose between different individuals;(3)the weakness of generalization ability.A model trained on one dataset is prone to suffering serious performance degradation when it is applied to another dataset.Aiming at the above challenges,this dissertation first proposes a weak learning strategy to construct label distributions,which closes the gap between the constructed label distributions and the ground-truth ones.Then a regularized convolutional neural network is proposed to learn robust features.And a deep multi-task learning method,in which face identification serves as an auxiliary task,is developed to reduce the influence of facial identity on learning head pose features.These two methods improve head pose eastimation accuracy significantly.At last,a fusion method of classification and regression based on label distribution is proposed to solve the problem of head pose estimation performance degradation across datasets,and the practicability of this method is preliminarily verified in classroom scenes.The main contributions of the dissertation are as follows:(1)To solve the problem that the label distribution generated by Gauss function cannot reasonably describe the real label distribution,this dissertation proposes a weak learning strategy,which uses a data-driven way to learn an approximate rational distribution for each image in training set,and the learned label distributions are used as supervision information for following learning stages.The performance of the traditional label distribution-based methods significantly degrades when more label information cannot be obtained,such as only knowing the horizontal angles.For this reason,regularization terms as well as positive correlation and negative constraint are further introduced into the loss function to improve the performance of the learned model.(2)To improve the accuracy of head pose estimation in a flat background,this dissertation proposes a novel head pose estimation framework.The proposed framework adopts a lightweight and robust convolutional neural network architecture which is composed of one backbone net and three sub-nets,and takes the whole image as input and label distribution as supervisory information.Two types of objective functions(KL divergence and Jeffreys divergence losses)are used to optimize the proposed architecture.Experimental results show that the proposed framework can well learn the deep feature representations with complementary characteristics,and is able to mine the class-discriminative regions that are of more discriminative.(3)To alleviate the influence of facial identity information on head pose estimation,we propose a deep multi-task learning framework that combines head pose estimation and face verification,where head pose estimation is the main task,and face verification is viewed as the auxiliary task.Considering that the discriminative features of face verification mainly lies in the face area,while head pose estimation should take the whole head region as input as possible.For these two different tasks,different image areas are given as input respectively.The common feature representations are shared by those two tasks in the bottom layers of the network architecture and are separated by a data separate module,and then are sent to their respective task branches.Finally,two loss functions weighted by a tradeoff coefficient are used as supervision to optimize the proposed architecture.(4)To implement head pose estimation in big classroom scenes,we propose a fusion method of classification and regression based on label distribution,and use a large-scale synthetic dataset to train the proposed model,which achieves excellent performance in cross-dataset experiment on the dataset acquired in classroom scenes.Moreover,we present a more reasonable geometric relationship model between attention,human position in the world coordinate system and estimated head pose angles,which in the last is successfully applied to students' attention recognition in the classroom scenes.
Keywords/Search Tags:head pose estimation, label distribution, regularization, attention
PDF Full Text Request
Related items