| With the rapid development of hardware and theory,object detection has been developed for more than 20 years,from the traditional method based on hand-designed features to the popular one based on deep learning.Since the introduction of AlexNet in2012,people have gradually used convolutional neural networks(CNNs)to deal with object detection tasks,and object detection methods based on deep learning have received extensive attention.It can be divided into two categories,one is the anchor-based method,and the other is the anchor-free method.The anchor-based algorithm also can be divided into two categories,one is a two-stage object detection algorithm,and the other is a one-stage object detection algorithm.This paper mainly improves the one-stage object detection algorithm,SSD and RetinaNet,as follows:First,in view of the lack of information fusion of the feature maps of different layers of the SSD algorithm and the lack of model receptive field,the SSD algorithm is improved.Improvements are made from the following aspects: the SSD object detection algorithm is elaborated,and a cross-layer fusion module and a receptive field amplification module are proposed to improve the detection accuracy of the original SSD detection algorithm.First,in view of the lack of information interaction between different layers of the original SSD model,and using the idea of FPN,a cross-layer information interaction module is designed,which not only enhances the semantic information capabilities of different layers,but also reduces the information differences between different layers.Then,in order to improve the receptive field and multi-scale detection ability of the model,a receptive field amplification module is designed.Finally,a batch normalization layer is used to reduce the training time and improve the convergence speed of the model.To evaluate the effectiveness of the proposed ESSD,experiments are conducted on the PASCAL VOC2007 test set,PASCAL VOC2012 test set,and COCO test-dev2017 test set.The experimental results show that on the PASCAL VOC2007 dataset,its mAP is 82.1% and the detection speed is 15.7FPS.Compared with the original SSD512,its mAP is increased by 2.3%;on the PASCAL VOC2012 test set,its mAP reaches 80.6%,it is also 2.1% higher than SSD512;its AP is30.9% on the COCO test-dev2017 test set,which is 2.1% higher than SSD512.Experiments show that the ESSD detector can still meet the real-time performance when it achieves high detection accuracy.Second,the original RetinaNet object detector does not utilize lower-level feature maps(such as P2 layer)and has poor detection effect on large objects,a shallow fusion module and a global attention module are proposed.By using the shallow fusion module,the detailed information of the model is enhanced,and the detection accuracy of the RetinaNet model on small objects is improved.Second,by adding a global attention module after the high layer(such as the P5 layer),the receptive field of the model is enlarged,and the detection accuracy of the RetinaNet model on medium and large objects is also improved.Finally,a series of experiments are conducted on the public datasets PASCAL VOC and MS COCO datasets.From the experimental results,the mAP on the PASCAL VOC2007 test set is 82.6%,which is 3.6% higher than the original RetinaNet800,and the speed is only slightly decreased(13.8FPS vs 16.8FPS);when ResNet-101 is used as the backbone network,on MS COCO test set,the AP of the improved RetinaNet model reaches 41.7%,which is 2.0% higher than the original RetinaNet800. |