The rapid development of computer hardware and software has greatly promoted the development and application of computer vision technology.Computer vision technology,typically represented by object detection,is serving many aspects of human production and life,such as face recognition,security monitoring,and intelligent driving.By enabling computers to understand and analyze images and videos,the monotonous and repetitive human labor is liberated and the digital transformation of human society is accelerated.As one of the fundamental researches in the field of computer vision,the main task of object detection is to accurately identify and locate all the objects of interest in each input image,and its pre-detection quality will directly determine the performance of object tracking,video analysis,and other related computer vision algorithms.The research on object detection has changed from traditional algorithms to deep learning algorithms.However,the detection problem of multiple target objects with large-scale changes still needs better solutions.In this thesis,we improve the detection accuracy of multi-scale objects by improving the feature abstraction expression of the convolutional neural network and improving the feature fusion design of the feature pyramid network and propose a new detector model.Specifically,the main research contents and contributions of this thesis are as follows:(1)This thesis proposes a deep convolutional neural network with variable receptive fields.Most of the existing detectors directly use the convolutional neural networks proposed for the classification task as the backbone networks to extract image features.The classification task has low requirements for the abstract expression of image features,while the detection task needs to simultaneously complete the classification and location of multi-type and multi-scale objects,which requires the image features extracted by the backbone network to have good semantics and spatial detail information at the same time.This thesis analyzes the important influence of the receptive field on the detection results,introduces the spatial attention mechanism into the existing selective kernel convolution module,so that it can dynamically adjust the neuronal receptive fields in the two dimensions of channel and space at the same time,and proposes the selective kernel residual unit and the convolutional neural network with variable receptive fields based on ResNet,to finally realize the image feature extraction optimized for the detection task.The experimental data fully illustrate the efficiency of this method.(2)This thesis proposes a variant of the feature pyramid network based on adaptive feature fusion.Feature pyramid network improves the performance of multi-scale detection by adding deep semantic information to shallow feature maps.Considering that most of the existing feature fusion implementations use heuristic design,which is prone to the problem of uneven semantic and spatial distribution,the multi-step fusion module proposed in this thesis can solve the above imbalance problem.To meet the needs of feature fusion at different scales,a global context module is introduced to realize adaptively deep feature fusion,which improves the autonomous learning ability of the feature pyramid network.Experimental results verify the advanced design.(3)This thesis proposes a new detector model.,which relies on the basic detector FCOS.We replace its backbone network ResNet-FPN with the above two improved networks and associate the center-ness branch of the detection head network with the two branches of classification and regression at the same time.Comparative experiments on the public dataset show that the performance of the detector model has been significantly improved compared with FCOS,and has reached the advanced level of the same type of detector. |