| In the information age,image object detection technology,as a meaningful research field,has important applications in many fields such as automatic driving,unmanned monitoring and intelligent medical diagnosis.However,at present,the mainstream object detection models still have problems such as inaccurate positioning of the prediction box and low resolving ability to features,resulting in low detection accuracy and high missed detection rate of the model in complex scenarios.In order to improve the performance of the object detection model,this thesis conducts research based on the current mainstream object detection model YOLOv3,and the main research work is as follows:Aiming at the problem that the bounding box positioning predicted by the YOLOv3 model is not accurate,the loss function of the model is improved.Firstly,the auxiliary box defined by the center point of the prediction box and the width and height of the ground truth box is proposed,and then the intersection over union of the prediction box and the auxiliary box is added to the regression loss function as a penalty term to help the prediction box better return to the real box.In order to further improve the performance of the model,Focal Loss is introduced in the confidence loss function,increasing the contribution of the positive samples and difficult to learn samples to the loss function,so that the model learns more characteristics of positive samples and difficult to learn samples.Experiment results on the public detection dataset PASCAL VOC show that the above improvements improve the detection accuracy and regression box accuracy of the model,and increase the mAP to 83.40%,which is higher than some mainstream loss function improvement methods such as CIOU and EIOU.Aiming at the problem that the YOLOv3 model has low resolving ability of features in complex scenarios,an Union Attention Module is proposed,and the input feature map is repeatedly squeezed-excitated in the spatial dimension and channel dimension,and the weight matrix of the same size as the input feature map is obtained.Then the hadamard product of the weight matrix and the input feature map is calculated,so as to reweight each pixel of the input feature map,which enhances the favorable features and suppresses the unfavorable features.Adding this attention module to the YOLOv3 model,the performance on the PASCAL VOC dataset exceeds that of some mainstream attention models such as DANet and CANet.The visualization results show that the model proposed in this thesis has better detection effect and produces fewer missed detections.In addition,the aerial video of the unmanned aerial vehicles at the construction site of the water conservancy hub is converted into a photo set,and according to the construction management needs,the 6types of targets that need to be monitored in the image are labeled by labeling software to produce an aerial image dataset.The model proposed in this thesis is used to train and test the dataset,and the experimental results show that the model mAP proposed in this thesis reaches 88.63%,which is higher than some mainstream target detection algorithms such as CenterNet and YOLOX-l. |