The current mainstream object detection algorithms based on deep learning can be roughly divided into two categories according to the processing strategy,i.e,the two-stage object detection algorithm represented by the R-CNN series and the one-stage object detection algorithm represented by the YOLO series.The earlier single-stage object detection has obvious advantages in speed compared with the two-stage object detection,though the recognition accuracy is lower,with the continuous update and optimization of the YOLO series,by the time of YOLOv5,whether its accuracy or speed of the model,has been widely demonstrated for the excellent performance.As one of the most influential algorithms in the field of object detection,this paper will take the YOLOv5 algorithm as the object to conduct model lightweight research to reduce the computing power requirements of the model on hardware resources,subsequently,analyze the different types of data imbalance problems that may occur when the YOLOv5 algorithm trains the model.It is optimized to design a more efficient and high-performance target detection network.Specially,the work of this paper can by summarized as follows:(1)This paper compares the merits and demerits of two efficient convolutional neural network design methods,manual design and neural architecture search technology,and finally chooses the manual design of convolutional neural network.According to the existing model lightweight experience and the network structure characteristics of YOLOv5,and making full use of the design pattern of efficient neural units,this paper proposes a variety of different Bottleneck structures to replace the corresponding original structures in YOLOv5.Under the premise of ensuring that the structure apart from the Bottleneck layer is consistent and all hyperparameters are the same,several sets of comparative experiments with different Bottleneck structures are set up and all experiments are based on the COCO dataset.The experimental results show that IBN,Tucker,SEGBottleneck and Split IBN can achieve good model lightweighting effects,and the parameter amount is reduced by 25.1%,23.1%,21.2% and 14.2%,respectively,the comprehensive effect of Split IBN,which can achieve model lightweight as well as improve the accuracy of the model is the optimal,and the m AP@0.5 is increased by 0.4%.(2)In the field of object detection,in addition to the convolutional neural network structure itself affecting the performance of the final model,it is also crucial to ensure the data balance of the object detection network.Through the research and analysis of the four types of data-imbalance problems,i.e.,spatial imbalance,category imbalance,scale imbalance and object imbalance,the corresponding methods are adopted to solve them in this work.For spatial imbalance and object imbalance,this paper investigates the latest Focal EIo U loss function for settlement.By adjusting the hyperparameters of Focal Loss to solve the category imbalance,and based on the characteristics of category imbalance,it proposes the way to choose the model training strategy in practical application.Aiming at the problem of scale imbalance,this paper improves the original Neck layer,i.e,the PANet structure,by borrowing the design idea of Bi FPN to solve it.On the original Neck structure P4 and P5,one and two feature information streams(Neck-Bi FPN1 and Neck-Bi FPN2)were added,respectively,and the experiments show that Neck-Bi FPN1 has the best effect,with an increase in parameter amount of 0.9% and the m AP@0.5 of 1%.(3)In this paper,considering the actual deployment of the object detection model,after the target detection network structure design is completed,in order to realize the subsequent optimization work in a more timely and convenient manner and at a lower cost,the research is carried out from the aspects of the activation function and the training strategy of the model.Different activation functions have different effects on the feature extraction of the object detection network,and the complexity of calculating different activation functions on different platforms is also different.For instance,Si LU costs a lot to calculate sigmoid on mobile devices.The comparison experiments of various activation functions,show that the effect of Hard Swish is close to that of Si LU,but its computational complexity is much lower.Through continuous iterative updating of the pre-trained model,it is found that faster model convergence can be achieved on the premise of ensuring the consistency of the object detection network. |