| Object detection is one of the basic tasks in the field of computer vision.It aims to use computers to process digital images,detect target objects of interest in images,and determine their categories and locations.This technology is widely used in intelligent transportation,industrial inspection,public safety and other fields.3D object detection technology is the core of perception and understanding of real scenes.Compared with2 D object detection technology,it can provide richer spatial information about the environment.Therefore,it has broad application prospects in the fields of autonomous driving,home service robots,and augmented / virtual reality.This paper mainly focuses on the performance of object detection technology based on anchor-free network in 2D and monocular 3D object detection tasks,and proposes a object detection technology based on anchor-free network and feature fusion.As a representative anchor-free object detection algorithm,Center Net has the advantages of simple structure and strong versatility.It has received extensive attention after publication and has become an important benchmark algorithm for researchers to optimize and apply in 2D and monocular 3D object detection tasks.This paper selects Center Net as the benchmark algorithm and optimizes it through a series of improvement measures.The specific research contents and work are as follows:(1)To address the problem of insufficient object detection capability in the anchorfree object detection model Center Net,this paper proposes an improved object detection model using a combination of attention mechanism and dilated convolution.Firstly,to enhance the network’s ability to capture semantic and positional features of the target,a novel non-local attention mechanism is designed to capture long-range dependencies of the target in both the channel and spatial domains.Secondly,to improve the network’s ability to represent objects of different scales,a multi-scale feature fusion module based on dilated convolution is designed,which uses a residual structure to fuse features from multiple scales,preserving the feature information obtained by the target at multiple scales.Finally,the proposed algorithm is evaluated on the PASCAL VOC dataset,and the detection accuracy of our algorithm is improved by 2.47% compared to the baseline algorithm Center Net,effectively improving the performance of anchor-free object detection algorithms.(2)In order to improve the object detection capability of the Center Net anchorfree object detection network,a novel attention feature fusion and multi-scale feature extraction network-based improved Center Net object detection network is proposed.Firstly,to enhance the network’s ability to represent multi-scale objects,an adaptive multi-scale feature extraction network is designed,which uses dilated convolution to resample the feature map to obtain multi-scale feature information,and performs feature fusion in the spatial dimension.Secondly,in order to better fuse features with semantic and scale inconsistencies,a feature fusion module based on channel-wise local attention is proposed,which adaptively learns the fusion weight between shallow and deep features,retaining key feature information from different receptive fields.Finally,by validating the proposed algorithm on the VOC 2007 test set,experimental results show that the final algorithm achieves a detection accuracy of 80.94%,which is 3.82%higher than the baseline Center Net algorithm,effectively improving the performance of the anchor-free object detection algorithm.(3)Aiming at the problems of lack of depth information and poor detection accuracy in monocular 3D object detection algorithm,a multi-scale monocular 3D object detection algorithm based on instance depth was proposed.Firstly,in order to enhance the processing ability of the model for different scale targets,a multi-scale sensing module based on dilated convolution was designed,and considering the inconsistency between different scale feature maps,the depth features containing multiscale information were re-refined from both spatial and channel directions.Secondly,in order to make the model obtain better 3D perception,the instance depth information was proposed as an auxiliary learning task to enhance the spatial depth features of 3D targets,and the sparse instance depth was used to supervise the auxiliary task.Finally,the algorithm is verified on the KITTI test set and the validation set.The experimental results show that the proposed method was 5.27 % higher than the baseline method in the AP40 of the car category,which effectively improves the detection performance of the monocular 3D object detection algorithm. |