| Surveillance videos are vital for recording,replaying and analyzing abnormal events.Traditional algorithm of detecting and analyzing abnormal events required manual process for surveillance videos,which means the method is reliant on a huge amount of manpower.Action detection algorithm based on deep learning architectures could learn robust activity patterns automatically from massive video data,therefore it gradually becomes a solution for real-time identification of abnormal events.Based on computer vision theory,continuously improved action detection algorithms have broad prospects and possess great research value.There are still some problems concern common activity detection algorithms,e.g.,low accuracy of behavior recognition,high false alarm rate of normal events and time-cost detection response.According to the dimension of temporal sequence range and spatial location with human object and action features,in consideration of the situation that diverse abnormal events in surveillance videos cannot be well predicted in advance,furthermore the frequency is much lower than normal events,the dissertation proposed a multi-stage method for detecting,classifying and annotating abnormal events.The proposed method has timely performance and considerable accuracy.Main contributions and work of the dissertation are as follows:(1)Convolution network based on region proposals and feature pyramid network are fused by lateral connections.Object detection model with residual backbone network is leveraged for locating human objects in videos fleetly and precisely.Experiments showed the model adopted in the dissertation improved the average accuracy by 6.3%.(2)Using modified video clips input of temporal stream instead of pre-calculated optical flow field,the dissertation proposed two-stream spatiotemporal network-based action detection model,which aggregated human action feature in temporal stream and spatial stream.The capacity of action detection model for calculating spatial motion patterns from video frames is improved through various spans of frame sequence in both streams.Predictions are then fused by cross-stream connection to infer human actions.Experiments showed the proposed model improves Top-1accuracy by 3.5% and Top-5 accuracy by 1.8%,reached 77.0% and 92.6%respectively.(3)Leveraging object feature and action feature,the dissertation proposed anomaly score learning model,based on multiple instance learning strategy.The model divides video clips into instances and learns abnormal behavior patterns iteratively in video instances for predicting abnormal scores,then exploiting object feature and action feature as inputs.Anomaly events have higher scores through multiple instance learning.Experiments showed the model has considerable classification performance and the accuracy reached 77.43%.The dissertation implemented and constructed human abnormal behavior recognition system,which possess functions as real-time recognition,action classification and annotating locations of abnormal events. |