Egocentric Activity Prediction Via Event Modulated Attention

Posted on:2020-11-25

Degree:Master

Type:Thesis

Country:China

Candidate:Y Shen

Full Text:PDF

GTID:2428330623963687

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

Egocentric video analysis,analyzing videos captured by first-view cameras,has been an emerging field in computer vision.Recent years,with the availability of wearable cameras like GoPro,Google Glass,Microsoft SenseCam,and their popularity in recording daily life,assisted living,and smart home,the number of egocentric videos has increased largely.Firstperson video involves a large number of object manipulation actions.The main challenge is unrelated regions of interest and cluttered backgrounds.Therefore,the state-of-the-art egocentric video analysis methods mainly use deep network frameworks to divide the hand region and the action area to do activity recognition task.However,In practical applications such as smart home and assisted living,there is a better prospect for activity prediction.The state-of-the-art egocentric video analysis methods use synchronous sequence network frameworks.The limitations are shown as follow: one is they cannot model temporal dependency of long-term,so they cannot predict future events.Second,redundancy and noise have a great impact on the results,they cannot distinguish which frame is more important to attend to.Motivated by these limitations,this thesis proposes an asynchronous and synchronous based two-stream LSTM,using point process as the mathematical model,focusing on the effect of gaze points on the long-term asynchronous event sequence and how to dealing with redundant and noisy frames.Usually,eye movements will reflect a person's thinking process,and people's movements will follow the trajectory of eye movement to a certain extent.Therefore,in this thesis,the asynchronous event is defined as the gaze point moving into/out of the object being manipulated in a video frame,which is closely related to the start/end of the activity.Aiming at the influence of redundant frames and noise on the results,this thesis proposes a attention score model framework.For a video frame sequence,each frame is scored.The decisive factor is the asynchronous event sequence.This event modulated attention model improves the accuracy of the experiments and enhances model robustness.In this thesis,the following innovative work is done:(1)based on the influence of point process condition intensity function,a deep recursive network model combining synchronous and asynchronous modules driven by gaze is proposed;(2)gaze information is not only applied as a visual saliency feature to the synchronization model,but also as an asynchronous event driver,simulating the interaction between actions in a long time sequence;(3)giving a score set to the input frame sequence,to reduce the influence of redundant and noisy frames.We conducted comprehensive evaluations and experiments on the open source of two egocentric datasets GTEA Gaze and GTEA Gaze+.The results show that our egocentric video analysis method is superior to the state-of-the-art algorithms.

Keywords/Search Tags:

egocentric, gaze, point process, asynchronous, attention

PDF Full Text Request

Related items

1	Allocentric and egocentric navigational strategies are adopted at comparable rates in a virtual MWM: an eye-tracking study
2	Research On Neural Point Process Model
3	Gaze Tracking And Its Application In Driver And Passenger Monitoring System
4	Research Of Video Captioning On Egocentric Videos
5	Multiple influences on gaze and attention behavior for embodied agents
6	Research On High Precision Gaze Tracking Algorithms And Application In VR System
7	Point-of-gaze reconstruction in head-mounted eye tracking systems
8	Study On Gaze Tracking In Unconstrained Environments
9	Research On Eye Gaze Estimation Algorithms In Virtual/Augmented Environment
10	Research On Deep Point Process Modeling Based On Self-attention Mechanism