Video-based anomalous behavior detection is a challenging topic under the field of computer vision.One of the main reasons is that it is difficult to have a specific and general definition for anomalous behaviors in videos.The anomalous behavior is extremely scene-dependent,and the same behavior may have different determination results in different scenes.It can be seen that for video anomaly detection,the most significant problem it faces is that the anomaly discrimination relies too much on the before-and-after relationship of the scene.To address this problem,the prevailing view is to adopt a dataset-centric unsupervised video anomaly detection approach,where behaviors that occur less frequently in the dataset,do not occur,or differ excessively from more frequently occurring behaviors are considered as anomalous behaviors.With the development of deep learning,video-based anomalous behavior detection methods have also made significant developments.To address the problems of high scene dependence of abnormal behavior and small number of abnormal samples in video abnormal behavior detection,this paper will use unsupervised learning methods for the construction of a network framework for video abnormal detection,with the goal of learning a network model described entirely by normal samples using a training set,and events and activities not included in the training set are considered as abnormal during testing.The main research work is as follows:1.A video anomaly detection method based on spatial perception and attention fusion is proposed to address the problem that the video anomaly detection method based on the traditional encoder-decoder architecture has insufficient extraction of spatial features of object appearance leading to insufficient detection capability.The method uses spatially-aware coding networks to enhance the extraction of spatial feature information of object appearance in video frames,while modeling the global dependencies of deep spatial coding through non-local modules.Then the attention fusion module is used to establish the interaction between spatial coding and temporal coding extracted by traditional coding networks to enhance the discriminative representation of the network for the input samples.Experimental analysis shows that the method in this paper is able to reach 84.2% and 95.7% of the frame-level AUC in the public benchmark datasets UCSD Ped1 and Ped2,respectively,with good detection performance for abnormal behaviors in videos.2.A video anomaly detection method based on global semantic enhancement and multiscale fusion is proposed to address the problem of semantic discrepancies and local perceptual field limitations of the traditional U-Net network architecture,which lead to poor detection in anomalous behavior detection methods using predictive future frame strategies.The method uses the constructed global semantic enhancement structure to inject the global semantics of deep coding into the underlying coding to compensate for the lack of underlying contextual semantics and to suppress redundant noise.Meanwhile,the designed multi-scale fusion module is used to establish the multi-scale mapping relationship of decoding features and enhance the spatial feature learning ability of the network for local foreground regions.Experimental analysis shows that the frame-level AUCs of the method in this paper can reach 85.1%,85.3% and 95.9% in the public benchmark datasets Avenue,UCSD Ped1 and UCSD Ped2,respectively,which can achieve good video anomaly detection. |