| Temporal action detection in video is a research hotspot in the field of computer vision,especially the one based on deep learning has made great progress at present,and it is of huge value to human and the society because the related systems of temporal action detection have a wide range of applications in a variety of scenes.Based on the investigation of a large amount of domestic and foreign relevant research literature,the current research status and its important research values of temporal action detection in video are firstly described in this thesis,then a detailed introduction is given for relevant technologies and mainstream methods,including graph convolution,temporal action detection,temporal action proposal generation,etc.Aiming at the difficulties of temporal action detection in video,research is conducted on the thesis of Temporal Action Detection Based on Relation Aware and Global Context.In order to break through the limitation which caused by single proposal information on temporal context acquisition and multi-scale action localization,the Temporal Action Detection via Overall-Local-Aware Graph Network method is proposed to improve the fine localization and action recognition ability of the model.In order to break through the limitation which caused by single local boundary information on global semantic acquisition and high-quality temporal action proposal generation,the Temporal Action Proposal Generation via Global Context Aggregation for Temporal Action Detection method is proposed to improve the quality of generating temporal action proposal by the model.Based on the above methods,a prototype system for temporal action detection in video is developed.The main research work of this thesis are as follows:(1)Considering constructing graph network with the inherent semantic associations between the overall and local spatio-temporal relationship information of the proposals is very beneficial for the temporal action detection,a Temporal Action Detection via Overall-Local-Aware Graph Network method is proposed.In order to obtain a richer overall spatio-temporal feature representation of proposals,the feature similarity and temporal overlap of each action proposal is comprehensively exploited to construct the overall relation graph reasoning sub-network of proposals.In order to obtain local relation information of proposals under different time scales,the partial order relationship over time for the proposals is exploited and the local relation graph reasoning sub-network is constructed,which consists of multiple levels graphs.Finally the rich overall-local aware features for the proposals are represented,which are used to predict and localize actions.The experiments conducted on two public datasets show that the proposed method is more accurate in action localization and it can effectively improve the action recognition accuracy.(2)Considering fusing the rich global context information in the video which makes up for the local features is very beneficial for the high-quality temporal action proposal generation,a Temporal Action Proposal Generation via Global Context Aggregation for Temporal Action Detection method is proposed.In order to obtain rich global semantics,the global context is firstly completely extracted by non-local like operations,then adaptively adjusted to match the fine-grained adjusted proposal features,finally aggregated into the proposal features according to the weight.In order to further generate multi-scale temporal action proposals with the aggregated global context,a temporal pyramid is constructed to combine the global context semantic information with the corresponding local salient features in different time scales.The experiments conducted on two public datasets show that the proposed method can generate high-quality temporal action proposals and in this way further improve the performance of temporal action detection.(3)A prototype system for temporal action detection in video is designed and implemented by Python language,Pytoch deep learning framework and Py Qt graphical interface development tool,which covers data preprocessing,model training and temporal action detection functional modules.The prototype system integrates the two algorithm models of this thesis,it has a good interactive experience and easy operation,which can satisfy the user’s needs of daily temporal action detection. |