Intelligent Video Surveillance (IVS) has been an active research area among computer vision community for decades. With the development of digital image processing and artificial intelligence, IVS can not only describe, understand and analyze video contents, but control the monitoring devices, these make Video Surveillance Systems perform better than ever. The research areas of IVS include motion extraction, description, detection, tracking, and recognition and behavior analysis of all objects in the video frames. Though considerable progress has been made in recent year, detection, as one the most basic and important steps, remains a challenging task. Like Human pose articulation, scale change, partial occlusion, low resolution, varied illumination and complex background all constitute major challenges to human detection.To tackle these challenges, we investigate the problem of human detection from three parts:features, partial occlusion and detectors. Our concepts are organized as follow:1) In order to improve detection performance, we build our algorithm from whole and part perspective. First, we treat human as whole window to establish sampled image pyramid, extract target of various scales based on multi-scale sliding window; Second, we make use of the analogy between human body and text to build a compositional model from body part alphabet to pose dictionary.2) Adopt a hybrid feature structure. Considering that objects features are affected easily, so it is very important to be right selected. In our work, we merge magnitude and orientation of gradient and LUV color space as our hybrid features.3) Propose the way tackling occlusion based on part alphabet and pose dictionary. We form a discriminative part alphabet, each grapheme of which is a mid-level element representing a body part, is automatically learned from bounding box labels. Based on this alphabet, the flexible structure of human body is expressed by means of symbolic sequences, which correspond to various human poses and allow for robust, efficient matching. A pose dictionary is constructed from training examples, which is used to verify human hypotheses at runtime.4) Conduct experiments on MCT Datasets, results show that our work is effective and can detect pedestrians in the videos in the real-time manner. |