The Action Recognition And Gaze Following Based On Multimodality Information

Posted on:2021-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhu

Full Text:PDF

GTID:2518306248986099

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the rapidly development of intelligent products and Internet,there is lots of data which has good and bad quality,and the data mainly is video.It is efficient in a lowcost way to check the content of videos especially actions by using intelligent algorithms.Thus,it could filter or give a early warning for some invalidate videos.Besides,currently methods recognized actions' categories based on the whole previous videos sequence,and they cannot infer humans' intention.However,it is crucial for recognition and classification to analyze intention.The ability of gaze following is inherent for humans and it help humans to better interact with things.Nevertheless,it still a difficult task for computer to simulate this ability.If computer can simulate gaze following,it would help to recognize incomplete videos.In the task of action recognition,how to use appearance information and motion information are two determined factors which could improve the precision of recognition.Appearance information could be obtained by sparse sample on video frames,and optical flows could be used to describe the motion information.However,the most of current methods ignored these information's latent connection.To solve this problem,this thesis proposed a three stream network to acquire more abundant information.Each stream adopted different modality data as input.Sampled RGB frames are used to describe appearance information,and stacked optical flows are used to describe motion information,and dynamic image is used to describe the spatio-temporal information.The dynamic image is the result of ranking pooling on RGB frames.Three stream network could obtain multiply modality action information in the video sequence for model an action.This paper showed some experiments on the UCF101 dataset,and it outperformed other previous methods because it acquired more abundant information.Currently,methods about gaze following resolved this task into two sub-tasks: saliency detection and gaze estimation.This kind of methods could detect some salient objects in the field of human's gaze direction,but they ignored the relevance among human,background and objects.Due to lack of the relevance information,large objects are more likely to be identified than small ones and it cannot handle with those images which have complex background.Aim at this problem,this thesis proposed a three stream network to obtain salient objects,gaze estimation and the relevance among human,objects and background.Saliency stream adopted original image as input.Gaze stream used human's head image and position as input.And relevance stream used an relevant matrix which contains spatial relevance among every object,human and background.Each stream outputted a vector to represent the feature of different modality information.In this paper,a large number of experiments were carried out on the Gaze Follow dataset,and the results of ablation experiments indicated the importance of relevance about understanding scene.This methods achieved the state-of-the-art performance than other methods on four indicators.Thus,three stream network can deal with the task of gaze following which in complex scene.

Keywords/Search Tags:

action recognition, gaze following, temporal stream, spatial strea, saliency detection, gaze estimation, relevance information

PDF Full Text Request

Related items

1	Research On Spatial-temporal Information For Action Recognition
2	Research On Gaze Tracking Technology Based On Kalman Filter And Image Saliency Detection
3	The Gaze Tracking System Based On Head Rotaiion Information And Eye Gaze Information
4	Research On Long-distance 3D Gaze Calibration And Eye Gaze Tracking
5	Research On Methods Of Human Eye Gaze Estimation With Head-Mounted Eye Trackers
6	Research On Eye Gaze Estimation Algorithms In Virtual/Augmented Environment
7	Human Eye Gaze Estimation Algorithm Based On Head-mounted Eye Tracker
8	Research On High Precision Gaze Tracking Algorithms And Application In VR System
9	Research On 3D Gaze Estimation Methods Based On Simplified Hardware System
10	Research On Appearance-based Gaze Estimation