Gaze direction contains lots of attention information and potential brain cognitive process.Gaze estimation tries to estimate the gaze direction and gaze target of human eyes.It could be widely applied in various application scenarios such as clinical research,human-computer interaction,education and so on.Although some commercial companies have developed mature applications for gaze estimation,most of them rely heavily on specific hardware devices which results in limited application scenarios.In this case,appearance-based gaze estimation methods have attracted more and more attention because of their simple equipment requirements,fast tracking speed.The development of deep learning further accelerates the researchers to pay attention for them.A series of appearance-based gaze estimation algorithms by deep learning have emerged and attained good performance,but there is still a long distance from commercial applications.How to improve appearance-based gaze estimation performance is a very important research field.At the same time,training deep learning models typically depends on the gaze datasets which are scarce.And the existing datasets have some problems such as inaccurate ground-truth labels,small coverage of head movements,only one view for subjects and too much limitation of head movements in the collection process.Therefore,this paper mainly studies the problem of personal-specific bias in the current appearance-based gaze estimation algorithms,and proposes a novel network to reduce it.On the other hand,this paper also studies how to build a better gaze estimation dataset to enrich the insufficient gaze data with more accurate ground-truth labels.In this paper,we study the background and development of appearancebased gaze estimation which is currently a kind of mainstream gaze estimation method.There are some shortcomings in the field including scanty gaze dataset and poor ground-truth precision.Aiming at these,we present a new SJTUgaze gaze dataset.which contains 127495 face images and corresponding ground-truth gaze directions from 16 subjects under 4 camera views.Compared with existing gaze datasets,SJTUgaze has following advantages:first,SJTUgaze uses Tobii X120 eye tracker to record the 3D eye positions of subjects and their corresponding gaze points on the screen.By accurately synchronizing the videos and eye tracker data,face images of corresponding frames are extracted,and the Tobii gaze data is transformed from the user coordinate system of the eye tracker to the camera coordinate system according to the equipment calibration information.So SJTUgaze has more accurate and reliable gaze direction vector labels.Second,SJTUgaze covers wider head pose and gaze direction range.We place four Go Pro cameras locating left,right,above and down the television screen to record real-time videos when subjects watch stimuli 140 ~ 175(88)away from TV screen to produce various head poses as much as possible.In this way,SJTUgaze can get plenty of face images with different head poses.Third,based on four camera views,SJTUgaze is a multiview gaze dataset,which makes up for the current scarce multiview gaze data.The experimental results on SJTUgaze single view data versus combined four views data validate our dataset can be used for multiview gaze estimation research.In addition,in the data collection process of SJTUgaze,subjects are allowed to move their heads freely,which is more realistic and natural.SJTUgaze also has additional eye movement event category labels,which can play a role in related researches beyond gaze estimation,such as eye movement classification.Due to the person-specific characteristics of human eyes,such as the different angle between the visual axis and the optical axis,the eye appearance difference,etc.,the already trained gaze estimation model performs poorly on a new subject data.And person-specific bias is one important reason.In order to improve the accuracy of gaze estimation,this paper proposes a novel gaze estimation model.The proposed model,based on meta-learning,can extract more common and transferable characteristics of human eyes.Once our model has completed training,it can use less than 20 subject calibration gaze samples to learn the current subject specific characteristics and update model parameters fast.Thus the proposed model can reduce person-specific bias greatly,and it gains better gaze estimation performance.The experiments on public MPIIGaze,Eyediap and proposed SJTUgaze dataset show that the proposed model has obtained minimal mean angular gaze errors,The cross dataset experiments between MPIIGaze and SJTUgaze further show proposed model has good generalization.Finally,lots of comparison experiments results validate that our model can really estimate gaze better. |