Font Size: a A A

Research On Heavily Occluded Pedestrian Detection Based On ResNet

Posted on:2022-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2518306563964219Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The researches on pedestrian detection based on Convolutional Neural Network(CNN)have made remarkable achievements.However,existing techniques of pedestrian detection can not absolutely deal with scenes with heavy occlusion.Res Net has made a difference in feature extraction,so it is widely used in computer vision tasks such as object detection.This thesis studies the problem of heavily occluded pedestrian detection in crowd scenes mainly including the two-stage multi-scale detection network of improved Faster R-CNN and the one-stage single-scale detection network of improved Retina Net,which use Res Net as the backbone network.The main work is as follows:(1)Based on the two-stage detection network,Faster RCNN,a multi-scale feature pyramid network(MFPN)has been proposed to solve the problem of heavily occluded pedestrian detection.A detection head method that two predicted boxes can be regressed from one proposal has been introduced,which can effectively reduce the misse rate of heavily occluded pedestrian detection.In order to better extract features of heavily occluded pedestrians,a network named Double FPN(Feature Pyramid Network)has been proposed to enhance semantic information and contours of occluded pedestrians in feature extraction and fusion stage.Repulsion loss has been introduced in this thesis to effectively separate predicted boxes from ground truths of other objects,which can further improve the accuracy of occluded pedestrian detection.(2)In order to solve the problems of the multi-scale detection in MFPN and the two-stage detection network,which cost a large amount of memory and takes a long time respectively,a single-scale detection network based on Retinanet network has been proposed in this thesis.The multiple input multiple output structure of the feature pyramid network is simplified to a single input single output structure,which consists of four dilated convolutions with different dilated rates.The simplified decoder is used to detect pedestrians on the single-scale features.At the same time,a method of uniform matching has been introduced to match four anchors for each pedestrian,so as to avoid the situation that occluded pedestrians,especially the smaller target,cannot match anchors.The experimental results on the public Crowd Human dataset manifest that,the detection accuracy of the improved two-stage network has increased by 5.16%,while the detection speed is basically unchanged;The detection accuracy of the improved one-stage network has increased by 2.81%,and the detection speed,FPS,has increased from 2.9 to 4.3,which have proved the effectiveness of the proposed methods.
Keywords/Search Tags:Pedestrian detection, Heavy occlusion handling, Faster R-CNN, MFPN, RetinaNet, Dilated convolution
PDF Full Text Request
Related items