As China announced to develop "Internet+" and "Innovation 2.0", "Smart city" concept has been formally proposed. The development of the city with the rapid growth of traffics has brought the increasing phenomenons, such as fatigue driving, drink driving, which may endanger the public security. On the one hand, standardized laws and regulations should be presented. On the other hand, technology is a powerful supplementary means. "Smart City" refers to the field of robot, automatic driving, intelligent security, virtual reality (VR), augmented reality (AR), etc. VR/AR is considered the most market potential of technology in the next decade. According to Goldman Sachs analysis, the global VR/AR market in 2025 could reach 180 billion dollars. Accurate object detection based on computer vision is the cornerstone of these technologies.Object detection refers to automatically identify and locate the object in multimedia data (static images, video sequences, etc.) using computer vision algorithms. The research of human has always been the most important content in computer vision. Human is a special class of objects. It not only has the general characteristics of ordinary objects, but also has its particularity-large intraclass variances, which is just the difficulty of pedestrian detecion. Therefore, the research results of pedestrian detection can be extended to general object detection. The research of pedestrian detection both has scientific value and social application value.Over the past two years, deep learning achieved many breakthroughs in computer vision. Pedestrian detection based on deep learning significantly improves the detection performance. Compared to shallow learning, the theoretical foundation of deep learning is weak, and it is still in the exploratory stage. Shallow learning works in many fields because its model is simple, flexible, and small-data for training. If suitable features can be extracted for training, shallow learning can still get perfect performance.One of the main advantages of deep learning is no artificial rules in feature extration, which is also a potential shortcoming. The appropriate guidance plays an important role in our lifetime, so deep learning is. In the process of deep learning, the appropriate guidance may achieve better results. This article focused on the theme of pedestrian detection introduced by shallow learning and finally settled to deep learning, indicating suitable shallow guidances can improve model accuracy of deep learning. The article completed the following works:Firstly, this article presents a fast method BINGH to produce high-quality proposals of people, instead of traditional sliding window. This method greatly reduces the number of candidate windows for image classification. Binarized normed gradients (BING) is one of the best methods for object proposal, though it has limitations in describing objects, because it only uses the simplest normed gradient (NG) feature. Average hash (aHash) feature can well describe the low-frequency of images, and its computation is extremely small. By combining the aHash feature and NG feature, the joint binarized normed gradients and hashs (BINGH) feature can describe the edge together with the structure of the object. BINGH can achieve a high detection rate (DR) with 500 proposals when trains on a single class of human. To speed up detection, the instruction set (SSE), a few atomic operations (BITWISE SHIFT), and other optimized tricks are used. BINGH can run at 200 fps on a single CPU.Secondly, this article proposes a joint HOG-Adaptive LBP (HOG-ALBP) method for pedestrian detection. BINGH is used to produce high-quality candidate regions to accelerate detection. In image classification, HOG-ALBP which expresses texture in more accuracies than LBP is employed to reduce the false positives of human-like objects. Meanwhile, deformable parts model (DPM) and latent SVM (LatSVM) further improve the detection. In order to further reduce the computation, the joint descriptor is preprocessed to avoid the redundancy in the detection process. On the other hand, optimizing the computation of HOG feature itself. Experimental results show that this method reduces false positives rate and improve efficiency.Thirdly, this article proposes a joint segmentation by probabilistic aggregation (SPA) and perceptual hash (pHash) method for pedestrian detection. Contrast Preserving Decolorization (CPD) is useful for image recognition by improving the contrast of gray image. The joint descriptor HSH (HOG-SPA-pHash) can adapt the variations of pedestrian scales, backgrounds, and demosaicing artifacts. Experimental results show that this method still has the capacity of anti-deformation and anti-occlusion without using DPM. The detection performance is similar to deep learning methods on several datasets.Finally, a deep learning method Guided Faster R-CNN which guided by shallow learning is proposed for pedestrian detection. This article guides the deep convolutional neural networks by the shallow SVM in dropout. The proposed selective dropout further reduces the overfitting of deep learning. Guided Learning enchances the generalization through complementary advantages-deep learning for the pipeline, and shallow learning for the guidance. Meanwhile, based on Faster R-CNN, the adaptive pooling layer is added to RPN, instead of resizing image; the SPP layer which can adapt more scale variations is added to Fast R-CNN, instead of the single ROI pooling layer. In addition, hard example mining is adopted in traning, and some key technologies of deep learning, such as random rectified linear units and batch normalization, are applied to Guided Faster R-CNN. Experimental results show that Guided Faster R-CNN based on Caffe platform achieves excellent and real-time performance of pedestrian detection on several datasets.Bad robustness and expensive computation always are problems among many pedestrian detection methods. In other words, they cannot reach a good trade-off in performance and speed. The proposed method in this article has good generalization, compared to the other methods, the detection performance has obvious advantages, and the real-time detection may satisfy the needs of practical applications. |