Font Size: a A A

Research On Algorithm Of Temporal Action Detection

Posted on:2021-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:G Y QinFull Text:PDF
GTID:2428330623468348Subject:Engineering
Abstract/Summary:PDF Full Text Request
Action Recognition task requires a cropped video segment as input,but the original video that is not cropped is collected,which is quite different from the actual situation.In this case,Temporal Action Detection task is proposed to locate the temporal boundaries of the action in the original video and classify the action.Although this task has only recently been proposed,it has quickly become a hot topic in the field of video understanding due to its practical significance.This paper mainly studies Temporal Action Detection tasks from end-to-end and non-end-to-end network frameworks.The main contents are as follows:1.The feature network and task network are researched and selected.Due to the relevance to the Action Recognition,by studying action recognition algorithms,this paper determines the focus of research on automatic networks and 3D convolutional networks.Due to the similarity with the target detection task,by studying the target detection algorithm,this paper determines a two-stage task network framework and focuses on how to generalize the two-dimensional algorithm in spatiotemporal.2.A non-end-to-end Temporal Action Detection algorithm is researched and improved.The algorithm uses 3DResNeXt as the feature network,independently trains and optimizes Action Proposal Generation and Action Classification.The Action Proposal Generation network converts the positioning time boundary problem into a binary classification problem of whether each position of the video feature is action,start time,and end time.According to the network output,the start and end time nodes are combined according to certain rules,and the final action proposal is obtained through NMS.In addition,non-local modules and dilated convolutions were introduced into the proposal network to deeply mine the video feature information,and a comparative experiment was performed.The action classification network reuses the features extracted from the original video according to the action fragment information generated by the proposal network,and correctly classifies each action proposal into the corresponding action category.Finally,length of the sliding window is explored and tested.3.Research and improve an end-to-end Temporal Action detection algorithm.The algorithm uses C3 D and 3DResNeXt as feature extraction networks,jointly trains and optimizes Action Proposal Generation and Action Classification.The Action Proposal Generation network,first maps each point on the extracted feature map back to the original video,and sets a series of anchor on the time dimension of the original video.Then,the anchor binary classification and time boundary regression are completed through the proposal network,and the final action proposal is obtained through NMS.In addition,by modifying the anchor,the depth feature information of the video is further mined.The Action C lassification network.The aforementioned proposal fragments of any length are sent to the 3D ROI Pooling network layer to turn them into features of equal length,and then the features are used to perform specific action classification and precise time boundary adjustment through the recognition network.Finally,the network structure is improved and features are extracted using 3DResNeXt,which is compared with the basic network,and the setting of the anchor is explored and experimented.
Keywords/Search Tags:Action Recognition, Temporal Action Detection, End-to-End, Action Proposal Generation, Action Classification
PDF Full Text Request
Related items