Font Size: a A A

Appearance-Attitude Fusion Network For Pedestrian Action Detection And Recognition

Posted on:2022-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:B B PanFull Text:PDF
GTID:2518306542463444Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Spatial-Temporal action detection of pedestrian is an important part of computer vision in the field of video understanding,which aims to accurately classify and locate the action information in video.With the development and popularization of video surveillance system,the demand for Spatial-Temporal action detection of video information is growing,especially the detection of pedestrian abnormal behavior in surveillance scenes.Therefore,the research on Spatial-Temporal action detection and pedestrian abnormal behavior detection methods not only has important scientific research value,but also has a wide range of real-life application scenarios.At present,most of the advanced Spatial-Temporal action detection algorithms adopt the research strategy of stacking adjacent frame detection results and processing them.The temporal information of actions is easily lost in convolution and down-sampling operations,which leads to the poor modeling ability of temporal actions and limits the development of Spatial-Temporal action detection.In view of the current advanced attitude estimation algorithm has excellent skeleton detection performance and action modeling ability,and in order to solve the above problems,we study the feasibility of the fusion strategy of RGB image information and posture skeleton information,and applies it to complex monitoring scene for abnormal behavior detection of fusion network.The work of our thesis mainly includes the following two parts.Firstly,aiming at the problem of the current Spatial-Temporal action detection algorithm has poor modeling ability for temporal action,the attitude skeleton information is introduced to make up for the lack of RGB information in action description ability.Inspired by the twostream network model,RGB network and skeleton network are used to represent the appearance information and action information of the target respectively.An end-to-end RGB attitude fusion network framework is proposed to solve the problem of Spatial-Temporal action detection.RGB network takes several adjacent video frames as input,and generates initial action classification and positioning proposal through multi-frame SSD network after nonmaximum suppression.Skeleton network takes the same video frame as input,and generates frame level pose skeleton information through advanced altitude estimation network,and stacks skeleton information according to time sequence after pose compensation and normalization,and divides it through LSTM network.The final detection result is generated by the fusion of skeleton network and RGB network.In addition,in order to solve the influence of dirty data on the detection results,we propose a network fusion strategy based on score proportion.We screened out single action videos from UCF-101 and other datasets to verify the temporal and spatial action detection and recognition ability of fusion network in simple scenes.Secondly,from the actual demand,the fusion network is applied to the problem of multihuman abnormal behavior detection in the complex monitoring scene.Aiming at the poor detection effect of small target attitude estimation algorithm and the difficulty of skeleton key points series connection of multiple pedestrian targets,this work improves on the basis of the first part.The improvement work is divided into two parts.First,we introduce a multi-scale attitude estimation strategy,we detect the moving target through vibe algorithm and preprocess the input data.After clipping,we estimate the attitude and map it back to the original image.Second,in order to solve the problem that abnormal video frames change rapidly and it is difficult to match multiple target skeletons in adjacent frames,we propose a new skeleton matching strategy,which replaces simple single skeleton stacking operation by matching the pose action region of adjacent frames.In addition,we photographed and made the abnormal behavior dataset containing 112 fighting actions and carried out experiments.The experimental results show that the appearance posture fusion network has good detection performance for multi-target abnormal behavior in the monitoring scene.
Keywords/Search Tags:Spatial-Temporal action detection, Video understanding, Abnormal behavior detection, Information fusion, Posture skeleton
PDF Full Text Request
Related items