| With the rapid development of Convolutional Neural Networks in the field of computer vision,significant progress has been made in the fields of target detection,target tracking,etc.Head detection is also widely used in the industry,such as crowd counting in scenic spots,and passenger counting in subway stations and high-speed railway stations.Head detection is another special detection method of pedestrian detection.Pedestrian detection in complex scenes has pedestrian occlusion problems,such as occlusion between pedestrians and pedestrians and pedestrians and objects.It is well known that the head and shoulder region of the human body has a small range of shape variation and high stability compared to the variability of the human body.However,the head occupies a relatively small area of the overall human body,and the existing general detection methods still have the problem of high missing rate in small target detection.Therefore,a human head detection algorithm DRM-YOLO with an improved feature extraction network DR-Net and a dilated convolution fusion module(Mixed Dilated Convolution,MDC)is designed for the problem of dense targets and small human head targets.In this research,we propose a method for head detection based on deep learning,and the main contribution we have accomplished are:(1)An image feature extraction network DR-Net combining Res Net and simplified Dense Net is designed,which can effectively detect human heads in complex scenes such as dense targets and occlusion situations.The network uses Dark Net-53 as the backbone,for which the residual module is pruned and a four-layer Dense Block is designed for the combination,forming a DR image feature proposed network based on residuals and dense connections,reducing the number of parameters,building the network depth while ensuring that the network uses stochastic gradient descent in back propagation thus allowing the network to reach convergence,enhancing information flow between layers and fully extract features.(2)An MDC module based on dilated convolution is designed,named MDC1 and MDC2 respectively according to the different expansion rates,and the three MDC modules are productively embedded in front of the YOLO-head.When the connection of the spatial feature pyramid is completed,the MDC is used to fuse the features of the four contextual informationaware modules and then sent to YOLO-head for detection,which expands the perceptual field,and obtains more fine-grained information,which also effectively improves the detection rate of small targets.(3)Since the target detected in this research is the human head,which is a small target for other objects or background in the picture,the original Anchor Boxes of YOLOv3 are not applicable to the present scene,so it is necessary to re-cluster the Anchor Boxes to better fit the human head under study.For this purpose,we designed the K-means algorithm based on DIOU,re-clustered the Anchor Boxes of three public datasets,and calculated the new Anchor Boxes of the target detection network in combination with the target detected in this research. |