Font Size: a A A

Temporal Action Detection Using Dense Dilated Convolutional Network

Posted on:2020-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:S B ZhuFull Text:PDF
GTID:2428330596475069Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the deep development of deep learning technology in the field of computer vision,the era of artificial intelligence is getting closer and closer to us.In particular,due to the promising application prospects in the fields of security,human-computer interaction,video analysis,etc.,temporal action detection task has attracted extensive attention from researchers in the scientific research community,and it is followed by an endless stream of research results.What is different from action recognition task is that on top of the untrimmed long video,action detection not only needs to output the action category but more importantly needs to locate the precise start and end time of the action segment,which is a more challenging computer vision task.The relationship between action recognition and action detection is much like the direct relationship between image classification and object detection.Based on the image classification problem,many powerful network models such as residual networks have been developed,and these models have also played a significant role in the method of object detection.Similarly,correlation models for action recognition,such as two-stream networks,are also widely used in temporal action detection.Because of the similarities between action recognition and action detection,many action detection frameworks use a similar framework to the object detection approach.The challenge of the action detection task is roughly summarized in three points: First,the boundary of the object target in the object detection is usually very clear,so a clearer bounding box can be marked.However,the boundary of temporal action detection is often not very clear.The start and end times are difficult to be accurate at the frame level.Secondly,only the information of the static image is used with temporal information not combined,action recognition can still retain a good performance.However,in the temporal action detection,because the boundary location has a strong dependence on the temporal information,it must be combined with the temporal information.Finally,the time span variation of the temporal action segment may be relatively large.In some datasets,the shortest behavior segment may only have one.Seconds,however the longest behavioral segment is more than 10 seconds.This is highly demanding for the network to capture multi-scale information and long-term information.We propose a novel network module(Dense Dilated Temporal Network,DDTN)based on a densely connected neural network(DenseNet)to effectively capture multi-scale and long-term information.This module refines both advantages of dilated convolution to expand the receptive field size without losing information and the DenseNet for efficient information fusion and propagation by setting the mode of the dilated coefficient in the dense module to be constant and the dilated coefficient between modules to be ascending.The two are organically combined to form DDTN.Experiments show that DDTN brings about the improvement of model effects in the current two solutions based on temporal action detection using deep learning(based on fine-grained frame-level detection and detection based on time series and classification regression).
Keywords/Search Tags:Deep Learning, Action Detection, Action Recognition, Multi-scale, DenseNet, Dilated Convolution
PDF Full Text Request
Related items