Font Size: a A A

Eye Tracking Technique Based On Deep Multimodal Learning

Posted on:2021-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:S ChenFull Text:PDF
GTID:2428330614458588Subject:Optical Engineering
Abstract/Summary:PDF Full Text Request
The direction of sight indicates what a person is looking at and interested in.Eye tracking is often used to analyze human intentions,and is widely used in computer vision,computer graphics,psychology,sociology,and human-computer interaction.Therefore,the research on eye tracking in this thesis not only has important theoretical value,but also has broad application prospects.Firstly,the overall scheme of the eye tracking system is designed,and the face detection methods in the scheme are analyzed and researched.The face detection method based on Multi-task Cascaded Convolutional Networks(MTCNN)is selected.At the same time,five face feature points(left and right pupils,nose tip,and left and right mouth corners)are obtained,and the scaled operation is performed on the detected faces.Then adopt the Active Appearance Model(AAM)and POSIT(Pose from Orthography and Scaling with Iterations)algorithm to estimate the head pose,and use the AAM algorithm to detect feature points on the detected face,and the POSIT algorithm determines the head pose according to the relationship between the feature points on the face image and the three-dimensional data points corresponding to the standard face model,and the human eye area is located according to the obtained human eye corner feature points,thereby obtaining the RGB image and the depth image of the human eye.Secondly,for the apparent eye tracking method,when the head moves freely,the error is higher and the eye tracking dataset is less,which causes the problem of network overfitting.In this thesis,based on the theory of deep multimodal learning,using transfer learning,combined with pre-trained convolutional neural networks(CNN),a deep multimodal learning eye tracking model based on transfer learning is designed.That is,the pre-trained CNN model is used to extract the feature maps of the RGB image and the depth image of the eye,and the head pose and the two feature maps are automatically fused at the fully connected layer of CNN,so as to perform eye tracking.Experiments show that the eye tracking model designed in this thesis can more effectively estimate the direction of the human eye's gaze and reduce the estimation error than the single modal,at the same time,the introduction of transfer learning can reduce the estimation error and accelerate the model training speed.Next,because the receptive field of the convolutional neural network usually reflects the size of the CNN learning ability,but it is limited by the size of the convolution kernel,and using the pooling operation to increase the receptive field causes the spatial information of the feature map to be missing,so considering the dilated convolution can increase the receptive field without causing information loss.A deep multimodal learning eye tracking model based on dilated convolution is proposed.Using dilated convolution to further improve Res Net-50,and through experiments prove that the dilated convolution can further improve the performance of the designed model,the advantages of eye tracking model in this thesis are shown by comparing the designed eye tracking model with the eye tracking model based on CNN.In order to facilitate the use of eye tracking to drive service robots,a classification-based eye tracking model is designed to predict five poing of regard,and tests is conducted on self-built datasets.The experiment proves that classification-based eye tracking model could effectively identify poing of regard.Finally,a human-robot interaction platform based on the classification eye tracking system is built on the intelligent service robot platform.The experimental results show that the eye tracking technology proposed in this thesis can drive the service robot to move,and has certain effectiveness and practicability.
Keywords/Search Tags:eye tracking, head pose estimation, deep multimodal learning, dilated convolution, service robot
PDF Full Text Request
Related items