Researches On Spatiotemporal Action Detection Based On Deep Learning

Posted on:2022-09-15

Degree:Master

Type:Thesis

Country:China

Candidate:Y M Wang

Full Text:PDF

GTID:2518306551470064

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The technologies of automatic detection for specific events in public places are of vital significance for the public security and the development of social intellectualization.Therefore,Spatiotemporal Action Detection(STAD)technologies which are often used for the detection of specific behaviors and corresponding locations,are desirable across a broad range of applications.In particular,the Violence Detection(VD)technologies which is mainly used to detect violent incidents,such as the violence in schools and prisons,have been extensively explored to meet the requirements of the application.Although the traditional STAD technology based on manual features has driven to a maturity,the low computational efficiency and poor ability to express feature has greatly hindered its widely practical uses in our daily life and production.Fortunately,the increasing application of deep learning in computer vision have substantially contribute to the development of STAD.In this paper,a thorough research on violence detection and spatiotemporal behavior detection technologies based on deep learning has been conducted,and can be summarized as follow:(1)Generally,the task of violent detection is merely to determine whether there is violent behavior in the provided video,but ignore the information of the corresponding spatial locations.Herein,the VD technology,based on STAD,can not only identify violent behaviors,but also detect corresponding spatial and temporal information.Inspired by the two-stage object detection architecture,we have designed a VD model base on R-CNN.In this model,actor proposal network was used to generate region proposals for humans,and the spatiotemporal features of violent behavior can be obtained by using a three-dimensional convolution and modeling the relevant region features within a certain time range.The high effectiveness of the model has verified by extensive experiments.On the basis of this model,we have further designed a complete VD system which including all the process(from data acquisition to detection results preservation),and optimized the detection process for better performance in practical circumstance(online or offline).(2)The existing methods for spatiotemporal action detection are usually derived from the two-stage detection architecture,including the positioning and classification process,which is widely used in object detection.However,this detection architecture inevitably leads to high computational costs and sub-optimal solutions when applied in STAD.In this paper,a simple and computationally efficient STAD model named MUB-Detector,which is time-sensitive and multi-branched,has been proposed.By using the MUB-Detector,which is based on a three-dimensional convolutional neural network with the powerful ability in spatiotemporal modeling,the STAD task can be simplified as multiple one-stage "object" detections.Then,the spatial location and action category of the action instances in each frame of the input video clip can be obtained and complete the one-stage STAD.Experimental results on two benchmark datasets(J-HMDB and UCF101-24)show that,compared with the method based on the two-stage detection architecture,the unified STAD framework proposed in this paper can effectively improve the detection efficiency.In particular,compared with traditional methods which require additional optical flow which result in expensive computation cost,MUB-Detector can achieve competitive detection accuracy and faster detection speed with only RGB images inputs.

Keywords/Search Tags:

violence detection, spatiotemporal action detection, deep learning

PDF Full Text Request

Related items

1	Violence Detection Method Based On Deep Learning
2	Intelligent Violence Detection Based On Deep Learning Method
3	Action Detection Based On Deep Learning
4	Research And Application Of Violence Detection System Based On Deep Learning
5	Research On Crowd Violence Video Detection Technology Based On Improved 3D Convolution Network
6	Violence Detection And Face Recognition Based On Deep Learning Method
7	Spatiotemporal Deep Neural Network For Video Salient Object Detection
8	Research On Human Action Analysis And Recognition Method Based On Deep Learning
9	Research On Violence Detection Algorithm Based On Trajectory Analysis
10	Spatiotemporal Multi-task Network For Video Action Detection