Visual-based object detection plays an indispensable role in environmental perception systems.As one of the research hotspots in the direction of image processing and computer vision,it can help the autonomous driving system detect objects such as vehicles,pedestrians,and traffic signs.It is one of the important technologies for achieving autonomous navigation and improving traffic safety.Therefore,optimizing and improving visual-based object detection technology has significant implications for research and application.In the current highly anticipated field of autonomous driving,image-based object detection technology is one of its core and most challenging problems.Effectively solving this problem will promote the development of the entire autonomous driving technology and further enhance the safety and intelligence of autonomous vehicles on the road.In recent years,the rapid development of deep learning technology has brought significant progress in 2D object detection and 3D object detection,and these technologies will still play an important role in future research and applications.However,2D object detection can only detect and locate objects on two-dimensional images,and cannot determine the distance of the object,which does not meet the requirements of the field of autonomous driving.Different from 2D object detection,3D object detection also requires localization in space to determine the distance of the object.The monocular 3D object detection uses a single camera for object detection.Because the monocular camera has a simple structure,easy deployment,and low cost,it is one of the algorithms that the industry urgently needs to research and develop.At present,3D object detection based on monocular images still faces issues such as low model prediction accuracy and long network detection time.In view of this,to achieve high accuracy and real-time performance for monocular 3D object detection,the main work of the article is as follows:(1)An anchor-free monocular 3D object detection network based on geometric keypoints is proposed,which uses the object height and its geometric knowledge projected on the image to estimate the depth information of the object.At the same time,to improve the detection accuracy of longdistance small objects,a loss function based on area and depth information guidance is defined to guide the model to pay more attention to the training of long-distance small objects.Specifically,the proposed network model takes the monocular image as input,through the backbone network and the decode module,the detection head realizes the corresponding attribute detection,and finally uses the geometric reasoning module to calculate the position and attitude of the object to complete the object detection.The method proposed in this paper is trained and verified on the KITTI dataset.Compared with the benchmark method KM3 D,the detection accuracy has been significantly improved,especially in the small target Cyclist category,and the detection accuracy has increased by about 1%.(2)Aiming at the problems of many network parameters and low detection efficiency in the abovementioned 3D object detection model,this paper further introduces the recently reported backbone network Rep VGG,and on this basis,a channel dimension information aggregation module is designed accordingly,which realizes a monocular 3D object detection model with low power consumption and high computational efficiency.Experimental results show that the model in this paper has a significant improvement compared to directly applying Rep VGG to the KM3 D method,while being more lightweight.In addition,it should be noted that compared with the KM3 D method based on Res Net18,although the detection accuracy of the lightweight model in this paper has declined,the detection speed has increased by 3ms. |