| In the development of object detection,object detection in aerial images has always been a research hotspot and difficulty.Aerial target detection has a wide range of demands in both civil and military fields and has high research value.The difficulty of our research lies in the small proportion of the target in the images,the multiple shooting angles of cameras,and the different shooting effects of different equipment,which lead to the problems of low detection accuracy and weak positioning effect of aerial objects.To solve those problems,a lightweight one-stage detection network RIDNet(densely connected Inception Res Net),is proposed.The main work of this paper is as follows:(1)A target detection network RIDNet V1 based on lightweight feature extraction network is proposed.Inspired by the Inception Net and Dense Net,a multi-branch residual dense block,RI-Dense,is designed.A multi-branch structure is constructed in RI-Dense,and different convolution kernels are set in different branches to extract features of different scales so that the network can detect multi-scale targets.After that,an Atrous Spatial Pyramid Pooling(ASPP)is added before the feature extraction network to provide multi-scale features for subsequent detection.Finally,a lightweight target detection network RIDNet V1 is constructed based on lightweight feature extraction network and ASPP.(2)This paper proposes a semantic enhancement network based on multi-path feature fusion and constructs an improved network RIDNet.Semantic enhancement is a concept in the field of semantic segmentation.In this paper,a semantic enhancement network is designed for object detection with the help of encoding and decoding ideas.The encoding network is the feature extraction network RI-Dense.After the encoding network,a decoding network is connected to recover the dimension and details of the feature map.Based on the idea of Inception Net,this paper proposes a multi-branch feature fusion module named RI-Deconv.Furthermore,build a semantic enhancement network based on RI-Deconv.The semantic enhancement network can effectively improve the detection accuracy of the algorithm.The RI-Deconv module is integrated into RIDNet V1 to build an improved lightweight target detection network RIDNet.(3)An image stitching method is proposed,and the loss function is modified.Because most aerial images are large,they need to be cut before being sent to the network.In the process of cropping,some targets are inevitably divided into several parts.This paper proposes a method of image stitching,which can combine the target fragments from different image patches.Besides,the loss function in the SSD is modified.In order to avoid the imbalance of positive and negative samples,the SSD algorithm does not use all the prediction results when calculating the loss value,but only uses the results with higher loss value.In this paper,it is modified as the focal loss.The loss value of all predicted results is included in the final loss value.By introducing the weight coefficient to control the impact of positive and negative samples on the loss value,the problem of sample imbalance is solved.A set of experiments is implemented to verify the effectiveness of the proposed method using two public aerial datasets,NWPU VHR-10 and DOTA,and the proposed detection models RIDNet V1 and RIDNet are compared with 6 popular target detection networks.In the data set of DOTA,which is difficult to detect,the detection accuracy exceeds the other six algorithms.The detection speed of RIDNet in DOTA data set is 47.4ms,and the model size is 39.4MB,which means the lightweight and real-time performance of the detection model are realized.In the ablation experiment,the combinations of different feature extraction networks and detector networks are tested,and the proposed structure achieves the highest detection accuracy.In order to select the most accurate combination of feature maps,the influence of five feature maps that used to detect the target on the detection accuracy is tested. |