| In recent years,thanks to the breakthroughs in satellite imaging technology and the development of deep learning,the use of deep learning algorithms for processing remote sensing images has gradually become an important area of research.As a special object detection task,earth object observation based on remote sensing image is a hot topic in the field of computer vision,and has important application value and research significance in military and civilian fields.However,teaching machines to detect objects in remote sensing images remains a challenging task due to the complex scenes,numerous categories with imbalanced numbers,and multi-scale variation between different categories of objects.These challenges result in poor detection performance of current deep learning-based detection algorithms on remote sensing images.Therefore,based on the relevant theories in the field of computer vision,this paper proposes a multi-scale object detection technique based on scene perception and reconstruction to address the above-mentioned difficulties in remote sensing object detection.The main research content includes the following key steps.(1)To address the problem of complex backgrounds and multi-scale objects in remote sensing images,this paper proposes an object feature adaptive network based on visual attention mechanism and dense feature pyramid network.Firstly,the feature extraction network is constructed based on attention mechanism of channel-space-energy.The attention mechanism can adaptively derive the weight vector of the features,explore the importance of each feature,and enable the model to effectively extract target feature information even in complex backgrounds.Then,the dense feature pyramid network receives the features extracted by the feature extraction network in the last four layers,and performs multi-level,multi-scale,and multi-connection feature fusion to facilitate the effective fusion of shallow texture features and deep semantic features,fully addressing the problem of different target sizes and scales.(2)As there are many unbalanced numbers of categories and a large number of objects with similar features are not conducive to differentiation,it is difficult for samples of deep neural network to match appropriate labels during training.To address this problem,this paper constructs a decoupling network based on vision transformer,which flattens the 3D feature map into 2D vectors and then calculates the relationship of feature points between each dimensional vector and reconstructs the feature scene to enhance the feature representation of objects between different categories,decouples the output coordinate information and category information at the same time,and optimizes the network parameters by the dynamic label matching method with anchor-free paradigm.This paper develops the above key techniques to optimize the network parameters,including the object detection method based on centroid regression after removing anchors and dynamic label matching strategy,and the calculation of CIOU regression loss,while the classification loss is combined with focal loss with weight to balance the number of different samples,so as to achieve efficient and high accuracy detection of categories with unbalanced number.Based on the aforementioned key steps,this paper has developed a multi-scale object detection technique in remote sensing images using deep scene perception and reconstruction.The proposed algorithm has been evaluated through comparative experiments and ablation experiments using the large-scale open sourced optical remote sensing object detection datasets,DIOR.The results demonstrate the feasibility and effectiveness of the proposed method for multi-scale object detection in complex optical remote sensing images. |