| In recent years,remote sensing technology has been continuously developing,and remote sensing images play an extremely important role in the fields of national defense,military affairs,and national life and production.With the accumulation of remote sensing images,manual interpretation methods or traditional interpretation methods based on manually designed image features can no longer meet the diverse task needs.With the rapid development of deep learning technology,researchers have started to apply deep learning technology to the field of remote sensing image object detection and have achieved significant research results.However,compared to natural scenes,the remote sensing images have high image resolution,diverse target scales,arbitrary directions,dense arrangement of small targets,and high background complexity,which causes great challenges to object detection tasks.The universal scene object detection algorithms are difficult to adapt to the complex distribution of targets in remote sensing images.Based on existing algorithms,this article delves into the application of convolutional neural networks and Transformer architecture in remote sensing image object detection.In the field of remote sensing image object detection based on convolutional neural networks,a remote sensing image object detection method with balanced rotational and horizontal bounding boxes is proposed.Aiming at the complex background and distribution of targets in remote sensing images,ML-FPN(Multi-layer-enhanced FPN)structure is used to enhance feature fusion and extract more effective feature expressions.For remote sensing image object detection,RH-head(Balanced rotational and horizontal bounding boxes detection head)is proposed to address the issue of bounding box deformation caused by angle prediction of near horizontal targets.By proposing a rotation parameter,the network can automatically determine the distribution status of the targets.Near horizontal targets are represented by horizontal bounding boxes,while inclined targets are represented by rotating bounding boxes.The optimal angle threshold is determined on the DOTA dataset to distinguish between near horizontal and inclined targets by setting different angle thresholds.Experimental results show that the rotation parameter effectively enhance the stability of the network’s predicted targets,thereby improving the detection accuracy of the network.In the field of remote sensing image object detection based on the Transformer architecture,a fully end-to-end detection network based on Transformer architecture has been designed.This article first analyzes the feature extraction ability of the Transformer architecture model.Experiments show that the Transformer model has more advantages and stronger feature expression compared to the CNN model under large-scale dataset.In the field of object detection,based on the DETR,this paper uses tricks similar to two-stage networks to predict a candidate box on each single feature,and selects the part with the highest score as the object queries to input the decoder.Moreover,this paper introduces denoising training to accelerate model convergence and make the bipartite graph matching algorithm stabilized.Compared to the CNN object detection model,the Transformer architecture model does not require complex hand-crafted design or post-processing NMS algorithm.The Transformer architecture model can directly output the final prediction results,making it a true end-to-end detection network.The experimental results on the DOTA dataset also demonstrate the excellence of the Transformer architecture in the field of object detection. |