| Object detection is a core topic in the field of computer vision,which aims to locate instances of a certain class of semantic objects in images or videos.It is a continuation of classification tasks and the basis for high-level visual tasks such as semantic segmentation and instance segmentation.In recent years,the vigorous development of deep neural networks has brought about the improvement of the performance of object detectors.Object detection algorithms based on deep learning have an absolute advantage in this field,and have important practical application scenarios and academic research value.This paper aims to improve the YOLOX-Dark Net53 object detection algorithm backbone network feature extraction ability,limited receptive field size,insufficient small object detection performance,and insufficient interaction of location information and semantic information in the feature fusion network.The main research contents are as follows:1.An improved convolution-based module enhancement algorithm is proposed.Cross convolution is added to the backbone network to ensure the extraction of richer object edge features;a new atrous convolution module is designed to replace the SPP structure to improve the receptive field while retaining high resolution and introducing multi-scale expression;using transposed convolution to the deep features are adaptively upsampled and increased in size,so that they can be fused with the shallow features,and the deep semantic information is introduced into the shallow layer to improve the robustness of small object detection;the attention mechanism is introduced to adaptively adjust the feature response value to enhance the usefulness feature,suppress useless features,and improve detection performance.The m AP increases by 1.4% and3.3% on the PASCAL VOC and SSDD datasets,respectively,which verifies the effectiveness of the algorithm.2.An improved feature fusion algorithm is proposed.Introduce the Res2 Net module to improve the receptive field of each layer of feature maps,and improve the multi-scale expression ability of neural networks at a finer-grained level;improve the common feature fusion structure FPN/PAN in object detection tasks,and make full use of the rich location and semantic information of the backbone network,improve the accuracy of small object detection;introduce the idea of fusing features of different layers,different sizes,and different depths in Hr Net,fully utilize the deep highsemantic feature information and shallow fine-grained feature information,and improve the model expression ability.On the PASCAL VOC and SSDD datasets,m AP is increased by 3.4% and 6.0%,respectively,which verifies that the algorithm can effectively improve the detection performance. |