Font Size: a A A

Video Abnormal Event Detection Based On Spatio-temporal Segmentation Network

Posted on:2024-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:T HuangFull Text:PDF
GTID:2568306935982669Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Numerous surveillance devices are utilized in public areas such as train stations,airports,and shopping centers,resulting in the creation of vast quantities of monitoring footage,and a lot of surveillance video data is generated.Detecting abnormal behavior in crowded scenes and long videos is a challenging problem,because a large number of objects in the video often produce occlusion and irregular movement of the crowd.Therefore,it is very important to automatically detect abnormal events in long video sequences in intelligent monitoring systems.Field of video anomaly detection using deep learning has witnessed extensive research in recent years owing to the significant progress made in deep learning research.The current popular method uses two processing streams to detect abnormal events.The spatial stream network learns the appearance spatial structure of the video,the time stream network learns the motion structure of the video,studies the correspondence between the appearance and its related motion,and how to learn the spatio-temporal features in the video more effectively is one of the research focuses.In addition,there is a serious imbalance of positive and negative samples in surveillance video data,because the probability of abnormal behavior is small,and the video is basically normal behavior.Based on the knowledge of deep learning,two kinds of video abnormal behavior detection algorithms are designed to learn temporal and spatial information better and reduce the influence of unbalanced data.The major details are presented below:(1)Since the appearance and motion characteristics of abnormal behavior are significantly different from those of normal behavior,this paper proposes an improved time segmentation network to learn the appearance and motion information in video.A time convolutional network with RGB frame difference graph as input is constructed to learn motion information,in which the acquisition speed of RGB frame difference graph is much faster than that of optical flow method.A spatial convolutional network with RGB images as input was constructed to learn the appearance information,and then the results of the two streams were fused.Furthermore,this article enhances the temporal segment network model with a convolutional block attention module,which utilizes two separate modules for channel and spatial attention to detect more precise features.By doing so,it achieves improved performance while maintaining a relatively small overhead.At the same time,the focal loss function is introduced in this article to mitigate the negative impact of severe imbalance between positive and negative samples.The proposed method exhibits good performance,which is supported by extensive experimental comparisons.The AUC on the UCF-Crime dataset reached 77.6 %,and the AUC on the CUHK Avenue dataset reached 83.3 %.(2)Due to the serious imbalance of positive and negative samples in surveillance video and the problem of sudden unknown abnormal events,this paper proposes a video abnormal behavior detection algorithm based on spatio-temporal features.First,in order to obtain the anchor frame position and size information of human target,the Faster R-CNN target detection network is introduced,in order to improve the model’s ability to generalize,dilated convolutions have been incorporated into the network.Then,input the target detection information into the fast and slow-motion classification network to obtain the judgment and prediction scores of normal and abnormal human motion.Slow networks run at low frame rate to obtain spatial semantics of video,and fast networks run at high frame rate to obtain motion semantics of video.The gradient coordination mechanism is introduced into the motion classification network to reduce the influence of class imbalance.Finally,the predicted scores of the instances obtained by the action classification network are fed into the multi-instance learning model to predict the abnormal scores of the videos.In this paper,UCF-Crime data set was used to obtain good experimental results,and the AUC value was 77.9%.
Keywords/Search Tags:Video anomaly detection, Temporal and spatial characteristics, Convolutional neural network, Focal loss, Multi-instance learning
PDF Full Text Request
Related items