In the anomaly detection system based on deep learning,due to the overly powerful feature extraction ability of the convolutional neural network,many abnormal samples can still be reconstructed by the features extracted from normal samples.These problems are partially alleviated by extracting,classifying and storing,and then reconstructing the classified features.However,feature information will still be lost,and the abnormal behavior recognition model with a memory module has limitations of limited storage capacity and limited time axis correlation;traditional methods based on regression models and shallow learning cannot effectively extract data features.As a result,the anomaly detection effect is poor,and the long sequence learning of the deep learning method is difficult,and the problem of gradient disappearance is prone to occur,so only abnormal judgment can be achieved.In response to the above problems,a temporal and spatial attention mechanism is proposed to replace the memory module to avoid the loss of feature information,and a multi-scale network is proposed to carry out deep learning features and capture local and global dependencies in a long range.The main work content of this thesis mainly includes the following two aspects:(1)A video anomaly detection based on temporal and spatial attention is proposed to solve the problem of limited memory module capacity and limited extraction of foreground feature information.Since abnormal phenomena always appear in a period of time,the introduction of time attention can better focus on abnormal time segments and improve detection efficiency;in the detection process,there is always a phenomenon that the background dilutes the abnormal area,so the model should focus more on learning the foreground area features,suppress the features of the irrelevant background,extract the points of interest in the picture,and pay attention to the local area.For this reason,spatial attention is introduced to improve the detection efficiency.The experimental results on the standard benchmark data set prove that the proposed method achieves better recognition rate and better detection speed than other methods.(2)A model based on a multi-scale temporal network is proposed to capture video length information and enhance the recognition of rare abnormal segments in abnormal videos.By using convolution kernels of different sizes to perform convolution operations on the feature maps obtained at a certain moment,new feature maps of different sizes are obtained,and upsampling operations are performed on feature maps of different sizes to enrich image features.The introduction of a Multi-scale Temporal Feature Learning(MTN)module can cover multi-resolution local temporal correlations and global temporal correlations between video clips.Temporal attention can effectively learn contextual information in video frames and improve detection accuracy.Experiments show that this method can effectively improve the performance of model detection. |