| Deep Learning promotes the rapid development of target detection technology.The current model has made remarkable achievements in large and medium-scale target detection,but there are still difficulties in small target detection.This is because small targets occupy less pixels,it is difficult to extract the discriminant features,easy to cause the model missed detection and false detection.At the same time,the small target has the problems of dense distribution and easy to be disturbed by environment,which also aggravates the difficulty of detection.But in real life,small target detection is indispensable,for example,small target detection in UAV aerial scene,it plays an important role in many fields such as national defense security,city management,resource exploration,disaster rescue and so on.In this paper,the lightweight model of Yolov5 is chosen as the main research object,and the CE-O-YOLOv5 s model is put forward according to the difficulty of small target detection and the defect of Yolov5 in small target scene.The main contents of this article are as follows:(1)The multiscale feature fusion network of YOLOv5 is improved.The first is to optimize YOLOv5’s prediction scale,by adding additional upper sampling layer to fuse the feature information of the shallower layer in the backbone network,to construct the detection layer for the smaller target.The C3 module and the convolution module at the end of Pan were further deleted to reduce the negative impact of scale mismatch while reducing the model parameters and computational load,a jump connection from backbone network to PAN is introduced to obtain richer feature information,and Concat-weighted feature fusion is used to weigh the importance of different information.The O-YOLOv5 s model based on the above improvements maintains the lightweight advantage of YOLOv5 s and significantly improves the performance of small target detection.(2)Based on the O-YOLOv5 s model,the Conv Mixer CBAM module is used to enhance the feature representation of small targets,and Efficient decoupled head is used to decouple classification and regression tasks.Conv Mixer CBAM is a lightweight and efficient feature enhancement module proposed in this paper,which uses depthwise convolution with large convolution cores and pointwise convolution to separate space from channel mixing,we try to find the corresponding relationship between the features by using less parameters and computation.In small target detection,the Conv Mixer module tends to find rich contextual information,and the embedded CBAM attention mechanism can help the network process the effective features more finely,promote the integration of small target details and contextual information.Efficient decoupled head is a kind of decoupled head which can solve the conflict between the classification task and the regression task.It can speed up the convergence of the model,improve the precision of the model detection,and reduce the increase of the computation as much as possible.Combined with the above improved CEO-YOLOv5 s model,m AP@0.5 on the Visdrone2019-DET-val is 42.1%,that’s a 8.1%improvement over the base model YOLOv5 s,and m AP@0.5 on the Visdrone2019-DETtest-dev is 35.1%,that’s a 5.8% improvement over the YOLOv5s. |