Font Size: a A A

Research And Implementation Of Video Action Detection Algorithm Based On Deep Learning In Oil Unloading Area

Posted on:2023-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:F LiFull Text:PDF
GTID:2531307073982999Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The oil unloading area of the gas station is an area prone to dangerous accidents.Especially during the oil unloading period,the slight carelessness of personnel in various operations can easily lead to safety accidents,which not only causes equipment damage and casualties,but also poses a threat to the life and property safety of the surrounding residents.Therefore,the detection and supervision of personnel behavior in the scene of oil unloading area is very necessary.With the development of deep learning,video action detection algorithm has made great progress.The algorithm can recognize the actions in the video all day and automatically,and gradually replace the time-consuming and laborious way of manual look back monitoring.Different from the general video action detection task,there is a close relationship between people and surrounding objects and environment in the scene of oil unloading area.It is very important to find the objects that people are operating and the interaction between people and them;In addition,in the scene of oil unloading area,the movement range of personnel is very large,and the walking back and forth of personnel will bring serious space-time displacement problems;Moreover,the existing datasets have few training samples and weak practical applicability.In response to the current problems,this thesis has completed the following main work:Firstly,this thesis constructs the Unload-Actions dataset based on the real unloading area scene surveillance video,including 30 action categories and 1539 complete spatio-temporal action annotations,and the total length of the annotated videos reaches 6.5 hours.In the aspect of interactive relationship modeling,in order to effectively utilize the interactive relationship clues between human and surrounding objects and environment,this thesis proposes an interactive relationship modeling method based on multi-head attention.This method first models the four feature representations of interactive action: character feature,temporal feature,background feature and spatial feature,then makes these interactive features and human spatio-temporal features pass through multiple interactive feature enhancement blocks connected by "serial first and then parallel",and finally uses the enhanced interactive relationship feature for action classification.The interaction relationship modeling method has completed comparative experiments and ablation experiments on the Unload-Actions dataset,and obtained a large performance improvement.In terms of spatio-temporal feature extraction,this thesis proposes a spatio-temporal feature extraction algorithm based on video clip feature bank to alleviate the serious problem of action spatial displacement in video.The core of the algorithm is to design a video Clip Feature Bank(CFB)to store the video clip features that overlap each other on the time axis.When iterating the current key frame,multiple clip features on the adjacent interval and the current segment features are jointly input into the multi head attention module,and the accurate spatiotemporal features are used for relationship modeling and action classification.The results of comparative experiments and ablation experiments on the Unload-Actions dataset demonstrate the effectiveness of the spatio-temporal feature extraction algorithm.Finally,this thesis designs and implements the action detection system in the scene of oil unloading area,expounds the core modules such as video decoding module,action detection module and event analysis module in detail,introduces the system architecture design,program design and deployment related technologies,and shows the operation interface of the system.
Keywords/Search Tags:Spatio-Temporal Action Detection, Interaction Modeling, Video Understanding, Attention Mechanism, Deep Learning
PDF Full Text Request
Related items