| With the rapid development of convolutional neural networks(CNNs),object detection has become one of the hot fields in the CNN research.Currently,the major difficulty of object detection tasks lies in the network depth and complexity.Given the necessity of processing substantial visual information,object detection extracts image features by relying on deep networks.Besides,the excessive depth and complexity of deep networks may lead to vanishing or exploding gradient problems,which makes the gradient descent algorithm hardly converged,thereby resulting in great loss of model accuracy.Meanwhile,designing lightweight networks with both high efficiency and accuracy is also one of current challenges in the field.Considering that object detection models often need to be deployed in mobile and embedded devices,it is necessary to design a lightweight network to meet the real-timeliness and computational efficiency requirements.Additionally,multi-scale feature fusion also constitutes one of the challenges in constructing object detection networks.How to improve the accuracy of object detection by effectively integrating features of different scales remains one of the difficulties.Existing solutions to these problems include using advanced backbone networks to enhance feature extraction capabilities,exploiting cross-layer connection schemes to preserve features of different scales,designing feature extraction algorithms for predictor heads to improve object classification efficiency,and using data enhancement algorithms to generate more feature samples.Hence,an important research direction in the field of object detection is to cope with more stringent detection tasks by designing and optimizing detectors with stronger generality,robustness and feature extraction ability.This study designed and optimized a deep CNN in response to the above problems.Relevant research was conducted in the following three aspects:1.This paper introduced a modified object detection network architecture,which consisted of backbone,neck and detector head.Modification was made with the use of CSPDarkNet as the backbone.MHSA module was embedded and,in its front end,the spatial pyramid-based FP module was embedded.To tackle challenges in object detection like adaptability of different scale features,interference of complex background,redetection and miss detection,this paper introduced the FP module and attention mechanism,and embedded the spatial pyramid pooling module between the backbone network and MHSA module to retain feature information at more scales.The proposed network architecture was modified on the basis of SSD detector,which could better adapt to the difficulty of complex feature recognition in object detection tasks.Testing and performance comparison were accomplished with the utilization of public dataset COCO,thus verifying the superiority of the proposed backbone architecture.2.In this paper,modification was made on the proposed backbone architecture,in order to allow better suitability for lightweight network application environment.By introducing MobileNet into the network architecture and optimizing its algorithm,a more efficient network model was designed.The CSPDarkNet module was replaced with inverse residual module and linear bottleneck without compromising performance,and the network parameters were modified to address network implantation in the mobile and embedded devices.The results of testing on the COCO dataset show that the architecture model achieves better performance than existing lightweight and weight-based networks,thus proving its feasibility in object detection tasks.Additionally,the lightweight design and regularization techniques adopted in the network also played important roles,which improved the model generalization performance effectively.3.The proposed backbone architecture was further optimized,in order to meet the special requirements of object detection tasks with UAV equipment.By introducing neck architecture BiFPN,the feature maps of different scales were effectively fused to improve the feature extraction and recognition abilities.Meanwhile,the MHSA module was combined with the SSD head to add semantic information,and direct modification was made on the anchor point production function to generate anchor points proportionally corresponding to the input feature scales.Finally,by analyzing the object characteristics of dataset,the aspect ratio of anchor points was modified for a single predictor head,so that the candidate box generated by it was more consistent with the real ground truth.Owing to these optimization measures,the network architecture was suitable for object detection tasks of highly difficult datasets.After deploying our network on the VisDrone2021 dataset,superior results were attained,thus validating the model performance. |