Font Size: a A A

Action Recognition And Localization Based On Deep Learning

Posted on:2021-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y S ZhuFull Text:PDF
GTID:2428330647952410Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the unprecedented development of deep learning in image domain,people are increasingly focusing on videos.Video understanding has become an important topic in artificial intelligence and computer vision,which mainly covers two major tasks: trimmed action recognition and untrimmed temporal action localization.Trimmed action recognition refers to determining if a trimmed video belongs to a specific predefined action category.Untrimmed temporal action localization refers to locating the action instance and finding its temporal boundaries.These two tasks have a widespread application prospect in the fields of intelligent monitoring,medical health,autonomous driving and robotics.Although the great process has been made,it still suffers from multiple problems such as viewpoints change and spatio-temporal modeling.Firstly,we may face many questions such as camera motion,viewpoints change,noisy backgrounds and diverse temporal gap when we take a video,so it is hard to locate the action instance precisely.Secondly,considering action as a process of motion evolution in the temporal dimension,temporal information is of vital importance in action recognition and localization,how to make full use of temporal context and how to explore the spatio-temporal relations deserve to be further studied.Aiming at solving these problems,we propose two methods and the main contributions are as follows:We propose a multi-view attention algorithm in trimmed action recognition.By introducing our multi-view attention mechanism into the basic 3D convolutional neural networks,our new model can learn fine-grained spatio-temporal information adaptively.We also use optical flow and two-stream method to further improve the accuracy of recognition.We validate the effectiveness of our method on two benchmark datasets.We propose a collaborative local-global learning algorithm in untrimmed temporal action localization.Early methods focus on local features,while the time span of different actions are diverse,fixed geometric structures may not be able to capture the diverse global information.So,we propose a collaborative local-global learning algorithm to combine local and global features.On top of the diverse temporal gap,the other factors like viewpoints change,camera motion also negatively influence the temporal boundary location performance.To solve this problem,we introduce an attention mechanism.Network armed with attention mechanism has the ability to focus on actions.We validate the effectiveness of our method on two benchmark datasets.
Keywords/Search Tags:Deep learning, Action localization, Action recognition, Attention mechanism, Spatio-temporal modeling
PDF Full Text Request
Related items