Font Size: a A A

Appearance-Motion Memory Consistency Network For Video Anomaly Detection

Posted on:2022-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2518306539462504Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the vigorous development of artificial intelligence technology,especially the breakthrough of basic theory and engineering practice represented by deep learning,smart cities,and security-related industries have also developed by leaps and bounds.One of the important applications is intelligent video surveillance systems.The detection of abnormal events in the surveillance video is another very important but very difficult task.Recently,many methods have been proposed to solve this problem.The previous methods either only considered unilateral appearance or movement information or only directly integrated the decision results of appearance and movement information in the testing phase without considering the inherent consistency and semantics of the two modalities.Inspired by human recognition of abnormal events,this paper proposes the Appearance-Motion Memory Consensus Network(AMMC-Net),which detects the consistency of the prior information in the apparent-motion bimodal signal by explicitly modeling the semantics of the prior information in the video of unusual events.Considering that the original features are mixed with prior information that characterizes normal semantics and the specific information of each frame itself,only prior information can have consistent semantics.Our method first proposes a video prediction framework enhanced by the discrete dictionary module to model the prior features of the two modes of apparent motion in normal events and save them in their respective memory pools.Considering that a single memory item is not enough to represent the information of the query vector,we use multiple memory items.Secondly,we designed a two-way mapping network to realize the mutual expression and fusion between the prior information of the two modalities.That is,the semantic consistency between the two modalities was modeled,and the semantics of the two modalities were obtained—consistency characteristics.Finally,considering that the prior features have lost the original feature information,we separately combine the original features,prior features,and semantic consistency features of the two modalities to obtain a more essential and powerful response to the two modalities of normal events.Feature expression,which can greatly increase the distinction between abnormal and normal events;in the anomaly detection stage,considering that the previous method only considers the pixel prediction error in the pixel space and is easily affected by background jitter and noise,we consider introducing semantics in the feature space Consistency error to enhance the performance of anomaly detection.Specifically,we use a weight parameter to linearly combine the feature semantic consistency error in the feature space and the pixel prediction error in the pixel space and normalize it in all test frames to construct the final evaluation index to enhance detection accuracy.In the model optimization stage,we use a two-stage training strategy to optimize the entire model and propose an exponential moving average method to optimize the memory pool module.Finally,in the experimental phase,we performed a large number of sufficient quantitative comparison experiments on a number of large public benchmark data sets to verify the advantages of the method and the comparison method in this paper and achieve the best anomaly detection performance currently.Finally,a large number of ablation experiments were performed in this paper,and the final index scores were calculated by removing the corresponding modules.The reliable experimental results proved the effectiveness of each module of the method in this paper.
Keywords/Search Tags:anomaly detection, video understanding, prior information, multi-modality
PDF Full Text Request
Related items