| Security devices such as surveillance cameras and UAVs are rapidly popularized,resulting in the need to analyze and understand the increasing video data by automatic means.Spatio-temporal Action Detection can identify the category,start and end time and spatial position of action in each frame on the untrimmed video,which has attracted more and more attention in the industry.In order to solve the problem of mutual interference between adjacent action instances caused by pedestrian intensive scenes in the application of the algorithm in security,and the reduction of model generalization caused by the deployment and online detection of multi-computing equipment.This paper proposes an action detection algorithm in pedestrian intensive scenes and a strongly generalized online motion detection algorithm.The main research contents are as follows:(1)This thesis proposes an offline action detection algorithm in pedestrian intensive scenes.In order to avoid extracting the features of adjacent actions,the algorithm introduces deformable convolution,channel attention module and Re-ID model,so as to improve the detection performance at the video clip level;Aiming at the shortcomings of IOU based connection algorithm in pedestrian intensive scenes,Re-ID embedding feature is used to measure the similarity between adjacent Tubelets in time series,so as to improve the original connection algorithm and improve the detection performance of video level.This algorithm improves 2.34% Frame-m AP and 3.89%Video-m AP on the custom dense pedestrian dataset,and 1.2% Frame-m AP and 1.5% Video-m AP on UCF101-24 dataset.The results show that the algorithm improves the performance of Spatio-temporal Action Detection in pedestrian intensive scenes.(2)In this thesis,an online action detection algorithm with strong generalization is proposed.Aiming at the decline of model performance caused by the inability to observe future frames under the online detection setting,the algorithm proposes to use privileged knowledge distillation and course learning to transfer the future frame information learned by teachers’ network to students’ network.Experiments show that the algorithm can improve the online detection performance of the model.At the same time,aiming at the problem of reducing the generalization of the model caused by the inconsistency of computing power among multiple devices,a lightweight action representation measurement module is proposed to learn the action instance relationship under multiple temporal scales,so as to improve the generalization of the temporal scale of the model.Compared with the algorithm without action representation measurement module,it improves Frame-m AP by 6.88% and Video-m AP by 7.15% on UCF101-24 dataset.(3)This thsis develops an action retrieval system based on pedestrian pictures.The system can retrieve the actions of specific pedestrians in the real-time input video sequence according to the pedestrian images collected in advance.If the retrieved pedestrian actions are dangerous actions,the alarm function will be triggered. |