Public security has been paid more and more attention in recent years.Intelligent video surveillance system plays a very important role in protecting public safety.As an important part of intelligent video surveillance system,event detection is a research hotspot in the field of computer vision and it is also one of the most challenging tasks.And object detection is the basis of event detection which is also very concerned.In this thesis,we mainly focus on the object detection and event detection in complex scenes.In terms of object detection,Faster R-CNN is applied to the detection of key-poses.And we make some improvements to improve the performance of detection system,such as hard negative mining,multi-class training and Soft-NMS.The key-pose based event detection method achieved first place for Embrace and Pointing events in both the 2016 and 2017 TRECVID-SED evaluation.The evaluation results indicate that the method adopted in this paper is effective and it can extract discriminative features of key poses.As for event detection,in order to utilize temporal information of videos,we propose a method which concentrates on the target event by detecting person’s key poses while combines the temporal information describing the key pose changes over time.Explicitly,we propose a recurrent model based on ConvLSTM integrated with temporal pooling(CLITP)to capture temporal representations as well as spatial features.In TRECVID-SED 2017,our spatial-temporal event detection method was applied to ObjectPut event,achieving first place and outperforming the best results of 2016. |