| As the application scenario of UAV aerial photography continues to expand,target detection algorithms for aerial images have become an important part of research in the field of computer vision.In conventional images,the currently proposed deep learning target detection algorithms perform well,but aerial photography images are not ideal due to the different imaging perspectives and the direct application of existing algorithms to the field of aerial photography images.The reason for the low accuracy of target detection in aerial photography images is mainly because the background of UAV aerial photography images contains a large amount of noise information,which makes it difficult to extract target features completely,and because the field of view of aerial photography images is large,the target The problems such as uneven distribution of targets,small target size and poor extraction of target features.This dissertation will study a variety of traffic targets such as pedestrians and cars in aerial images as detection targets,and analyse representative UAV images to extract three basic problems for target detection tasks in aerial images,which are complex backgrounds in aerial images and easy to produce occlusion;many small targets in aerial images;and uneven target distribution in aerial images due to wide fields of view.Two target detection networks for aerial images are proposed,and the innovations of this dissertation’s work are mainly reflected in the following three aspects:(1)To address the three basic problems of the aerial image target detection task,an improved aerial image target detection algorithm based on the YOLOv5 s model is proposed,and a feature fusion network for small targets is proposed for the problem of many small targets in aerial images,so that the large amount of location information in the shallow sampled feature maps of the network can be better combined with the feature maps extracted from the deeper layers of the network.In view of the complex background of aerial photography images,which are prone to occlusion,and considering that this is due to the weakened features of the occluded targets,the algorithm adds a Transformer coding module based on the improved residual network at the end of the backbone network to enhance the semantic features of the targets extracted from the deeper layers of the backbone network,so as to achieve the purpose of enhancing the detection effect of the occluded targets;In view of the uneven distribution of targets and the occurrence of dense targets due to the large field of view in aerial images,and considering the importance of the key location information of the targets when detecting dense targets,this algorithm adopts the method of deep separable convolution integrated attention mechanism,which aims to make the key location features of the targets more obvious in the feature map extracted from the shallow layer of the network,making the model detect dense targets better.Controlled experiments on the dataset show that the method improves the average various detection accuracy metrics m AP50 and m AP0.5:0.95 by 5.3 percentage points and 4.8 percentage points respectively compared with the original model,proving in controlled experiments with other models that the proposed method in this dissertation is more accurate than most existing models in aerial image target detection detection tasks.(2)A weighted feature fusion-based target detection algorithm for aerial photography images of UAVs is proposed.The purpose of the proposed algorithm in this chapter is to provide different problem solving methods in the face of problems in aerial photography image target detection tasks,and to provide different research ideas for solving similar problems in the future.For the problem of small targets,this algorithm uses a strategy of four convolutional layers for the backbone network,and adds a residual module between each convolutional layer to improve the learning ability of the model,which can make the feature maps extracted from each feature extraction layer of the backbone network retain the location information of small targets,enhancing the feature representation ability of the detection network for small targets in aerial photography images,and thus improving the detection accuracy of small targets.In order to highlight the target area,the algorithm proposes a channel-space hybrid attention module,which mines the weakened features of the obscured target from both channel and space dimensions and suppresses the background noise,so that the algorithm can detect the obscured target more accurately.For the dense target detection problem,considering that the location information of dense targets on the shallow feature map of the backbone network can be used for target localization and the semantic information of targets on the deep feature map of the backbone network can be used for target classification,inspired by residual networks and feature pyramids,this algorithm proposes a weighted feature fusion method by adding an extra weight to each fused feature and then fusing the weighted By adding an additional weight to each fused feature and then fusing the weighted features together,the fused feature map key location information and high-level semantic information are made more significant to achieve the purpose of enhancing the performance of dense target detection.(3)A UAV platform target detection counting procedure is designed and experimented.Ten common targets such as pedestrians and vehicles in aerial images are detected,and the procedure applies the target detection network proposed in this dissertation to the ground,which can detect targets based on UAV aerial video streams and count the detected targets by target type in order to accurately analyse the level of congestion in the current scene. |