Font Size: a A A

Study On Gaze Tracking In Unconstrained Environments

Posted on:2024-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:C Y MaFull Text:PDF
GTID:2558306920464844Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Gaze tracking is a technology that automatically and accurately estimates the gaze direction or gaze point of human,and is an important research branch in the fields of human-computer interaction and perception、 virtual reality、 military、and intelligent driving.At present,gaze tracking technology is limited by expensive and complex hardware devices such as infrared cameras and depth cameras,resulting in limited application scenarios and poor user experience.Due to the variability of individual eye appearance,the diversity of application scenarios,and the complexity of head posture and gaze direction,research on estimating the gaze direction or gaze point in unconstrained environments still needs to be conducted in a deep way.With the excellent performance of deep learning in visual detection tasks in complex environments,estimating the gaze tracking using deep learning has become the mainstream gaze tracking method.Based on the above issues,this thesis proposes an appearance texture-based gaze estimation and gaze point estimation method in an unconstrained environment using a monocular webcam.The main research elements are as follows:(1)Head rollover angle estimation.The input image of face tilt is not conducive to the training of the gaze estimation algorithm,and also affects the estimation effect of the algorithm.While the face tilt in the image is caused by the head roll angle,the key point of the face detected by the Retina Face algorithm is used to achieve separate detection of the head roll pose in low-resolution images,which eliminates the effect of the head roll angle on the performance of the gaze estimation algorithm and simplifies the gaze estimation task.(2)Two-dimensional Eulerian angle gaze direction estimation algorithm.Based on the design that the human eye has only two action poses,pitch and yaw,and separate estimation of head roll angle,the 3D gaze direction can be represented by 2D Euler angles(pitch angle,yaw angle).In this study,two fully connected layers are used to regress two gaze angles separately to improve the prediction accuracy of each angle.Two independent loss functions are used to train the two gaze Euler angles,pitch and yaw,respectively,and each loss function is a weighted combination of cross-entropy and mean-square values to precisely fine-tune the network and improve its generalization ability.The algorithm chooses Res Net50 as the base network and embeds the SE-Net attention mechanism to enhance the robustness of the system and to improve the estimation accuracy of the algorithm.The algorithm finally achieves an average estimation rate of 16 fps and an average test angle deviation of 3.8°.(3)Two-input gaze point estimation algorithm.In this paper,we add the face reticle branching network on the basis of the gaze direction estimation network,integrate the feature information of face image and face reticle using the fully linked layer,and later regress the horizontal and vertical coordinates of the screen gaze point independently with two fully linked layers.This algorithm simplifies the network structure of the multi-input gaze point estimation algorithm,and finally achieves an average test bias of 4.3 cm and an average estimation rate of 14 fps.
Keywords/Search Tags:Gaze estimation, Gaze point estimation, Deep learning, Human-conputer interaction
PDF Full Text Request
Related items