| Object detection is a technology that can quickly locate the predetermined target in pictures and videos which is widely used in People’s Daily life.Target detection has a wide range of applications,in the field of epidemic prevention,it can detect whether pedestrians wear masks in the scene of heavy traffic,effectively reducing the human cost of epidemic prevention;In the field of big data,better smart travel strategies can be built through statistics of peak traffic flow.In recent years,with the increasing maturity of driverless technology,the research focus of object detection has gradually shifted from the traditional field to the field of vehicle and pedestrian detection in complex scenes.Based on YOLOV4 algorithm,the network structure is optimized for the problem of low model detection performance in this paper,at the same time lightweight strategy is proposed to solve the problem of model redundancy,so as to better apply it to resource-constrained embedded devices.The main contents of the paper are as follows:1.To solve the problem of low performance of YOLOV4 model,a custom convolution operation sequence based on deformable convolution and depth-separable convolution is proposed in this paper to optimize the traditional feature enhancement network structure in YOLOV4.In the process of up and down sampling,the custom convolution sequence is used to replace the traditional complex convolution operation,which can improve the detection accuracy of irregular objects under the premise of reducing the performance cost of convolution operation.In order to further improve the model performance,CBAM attention mechanism module was added before YOLOHead detection to improve the model’s interest in a specific region.Compared with the performance of the traditional YOLOV4 network,the improved model proposed in this paper has faster detection speed and higher detection accuracy on the KITTI vehicle and pedestrian detection dataset,and the number of model parameters is also slightly reduced.The experimental results show that the improved method proposed in this paper has higher detection accuracy for both covered targets and targets with irregular shapes in the vehicle-pedestrian detection scene.2.To solve the problem of model redundancy and large number of parameters,the lightweight MobileNet v3 network is used in this paper to instead of the traditional CSPDarknet53 structure for feature extraction,which greatly reduces the number of parameters in the model.The Relu activation function was replaced with the Mish activation function with smoother gradient to improve the model generalization ability and solve the negative gradient truncation problem.Dilated convolution with different kernel sizes is used to replace the maximum pooling operation in the original SPP structure,which not only increases the receptive field but also reduces the loss of weak semantic information.The final model was validated on the KITTI vehicle and pedestrian detection dataset which further improve the detection speed and reduce the model size.Experimental results show that the final model proposed in this paper can be better applied to resource-constrained embedded devices in complex scenarios. |