In Recent years,with the rapid development of China’s social economy and remarkable improvement of technology,UAVs have been widely used,including agriculture,aerial photography,industry and so on.UAVs have the advantages of small size,low cost,high flexibility and simple operation.The UAV imaging system and its visual comprehension capability are key factors affecting the performance of UAVs,and it is important to study the object detection technology in UAV scenarios to improve the UAV perception capability.Aerial drone images are different from natural scene images and generally have the following characteristics:(1)complex background information,compared with natural scene images,aerial drone images are more susceptible to the influence of the weather environment,so there is also more noise interference,increasing the difficulty of detection;(2)large changes in the scale of the target,due to the different flight heights of UAVs,resulting in large changes in the scale of the target in aerial images,and a high proportion of small targets;(3)high target aggregation,due to the different imaging perspectives of UAVs,so the spatial distribution of the target varies greatly,and dense areas are easy to obscure the object information,and the object information is easily lost at the edge of the image.These characteristics of aerial images bring great challenges to the object detection task.Currently,deep learning methods based on deep neural networks have become the mainstream algorithms for object detection tasks thanks to their excellent feature learning and feature representation capabilities.The direct application of existing generic object detection algorithms to the UAV field can lead to poor detection results due to different imaging perspectives of UAV images,lack of training samples,more complex backgrounds,large object size variations,high percentage of small objects,and imbalanced training sample categories.In view of the above problems,this thesis proposes a new aerial photography small object detection method of neighborhood attention Transformer by using deep learning method around the characteristics of aerial photography image imaging.The main content and innovation points are as follows:(1)With the deepening of the network,the effective feature information of small objects in aerial images becomes less and less,which enhances the difficulty of detection.To address this problem,based on the current mainstream general object detection algorithm YOLOv5,this thesis combines neighborhood attention to introduce the neighborhood attention Transformer(NAT)module,which can interact and correlate local features at any two locations compared to traditional CNN modules,and therefore can capture longer contextual information and more global information.NAT further enhances the local feature extraction capability by introducing location bias information while retaining the advantages of Vision Transformer(Vi T).In addition,this thesis replaces the Leaky Rectified Linear Unit(Leaky Re LU)activation function in the original convolutional neural network with the adaptive activation(activate or not,ACON)function to further improve the generalization ability of the model.(2)In this thesis,we propose a feature fusion network with stronger learning ability based on path aggregation network(PANet).The size of the object varies greatly in drone images due to the different imaging views.To address this problem,this thesis proposes a two-space aggregation pyramid(TSAP)module combined with coordinate attention.TSAP extracts feature channel attention along horizontal and vertical directions respectively to preserve the dependencies between longer distance network layers and accurate location information.The shallow network feature information contains more edge information,so it is important for detecting small objects.Deeper networks have reduced image resolution and can detect large size objects,but more detail information is lost.The two-space aggregation pyramid module generates location-sensitive feature maps by fusing high-level semantic features and low-level detail features to enhance the learning ability of the network for objects of different sizes.(3)In this thesis,a new loss function is constructed that facilitates balanced training.To address the problem of unbalanced number of categories in UAV detection dataset,which leads to unbalanced training process,this thesis combines Complete Intersection over Union(CIo U)loss,Binary Cross Entropy(BCE)loss and Equalized Focal Loss(EFL),which can solve the problem of unbalanced foreground categories compared with the traditional Focal Loss.(4)To address the problem of limited computational and space resources due to the small size of UAVs,this thesis proposes a light-weight convolutional encoder network(LCEN)based on the nearest neighbor attention Transformer aerial photography small target detection method by combining convolutional neural network and Edge Ne Xt hybrid architecture.LCEN improves the inference speed while guaranteeing the model performance to facilitate the deployment of the model to airborne edge computing devices.The experimental results show that the method in this thesis can effectively solve the problems in aerial photography image small object detection.Meanwhile,by comparing the detection results in two datasets,this thesis’ s method outperforms other mainstream aerial photography small target detection models in terms of average accuracy.Among them,compared with the latest aerial photography scene object detection model TPH-YOLOv5,the average accuracy of this thesis’ s method is improved by 6.72%. |