| Gaze tracking has a wide range of applications in many areas such as humancomputer interaction,education,criminal,medical,and business.In recent years,due to the rapid development of computer vision and deep learning technologies,the research on gaze tracking has gradually increased.However,current gaze tracking systems still have some shortcomings.For example,it remains a challenge to enhance the prediction accuracy of the human eye gaze direction by breaking through the limitations of lighting conditions and head posture while reducing the complexity of the system and simplifying the calibration procedure.In this paper,the algorithm model is built by considering three issues: accuracy,robustness and network complexity of gaze tracking.In this paper,we propose a CRAB(Combine Res Net and Bi-LSTM)algorithm that combines Res Net convolutional neural network and Bi-LSTM network based on the deep learning based gaze tracking algorithm,and then carry out the algorithm improvement and gaze tracking accuracy enhancement.Firstly,a deep learning-based algorithm for gaze tracking is applied and investigated in this paper.Convolutional neural networks are widely used in computer vision because they use convolutional operations to achieve inter-layer connections,which can reduce the computational effort and at the same time reduce the equipment cost.Based on this,this thesis establishes an algorithmic model based on convolutional neural networks.Secondly,for the problem of gaze tracking accuracy,the paper improves the proposed basic algorithm CRAB.In order to better extract the features of the input information,this paper adds a spatial weights module and a CBAM attention mechanism to the CRAB algorithm,both of which have the ability to enhance the feature extraction of eye information and also suppress the influence of other facial regions on the extraction accuracy;in addition,the Bi-LSTM network in CRAB algorithm provides a way to model sequences to fit the temporal information,and this way can also improve the accuracy of the prediction results for the direction of gaze.Thirdly,in the recent field of gaze tracking,it has been found that the input of the network is the information of the full-face region can improve the performance of the algorithm.Considering the practical application of gaze tracking and the idea that driving information from the full-face region can improve the performance,the model in this paper chooses the input of continuous frames of video sequences with full-face information.The Gaze360 dataset,which combines 3D gaze annotations,head poses with a large range of movement and a wide range of data acquisition environments,is chosen for experiments,and because of the diversity of this dataset,the CRAB algorithm proposed in this paper is shown to have good robustness.Finally,in order to improve the gaze tracking accuracy and robustness while reducing the network complexity,this paper conducts experiments on the improvement of the backbone network through the Res Net series networks and its variants.The experimental results demonstrate that the highest accuracy of the current experiments can be obtained when Res Ne St-50 is used as the backbone network,and Res Ne St-50 is a lightweight network architecture based on Attention,which ensures the accuracy without losing the lightness of the network.The experimental results can be obtained that the CRAB algorithm with Res Ne St-50 as the backbone network gets the optimal experimental results after adding the spatial weights module,and the human eye angular error calculated by the algorithm is reduced from the original 13.5° to 12.6°,reducing0.9°,which further improves the accuracy of the algorithm model. |