| Computer vision is a discipline that simulates biological vision by processing and extracting information from images,mainly including object detection,image segmentation,person re-identification,and other directions.Object detection is a basic direction in the field of computer vision,which uses information extracted from images to determine the position of objects and determine their category.With the development of deep learning and hardware equipment,more and more visual scientific research work has been successfully transformed into products for application in life and production.At present,many automobile companies are carrying out driverless related work,and vehicle and pedestrian detection as a necessary function of driverless has been widely concerned.At present,there are many complex problems,such as mutual occlusion between pedestrians,small occupancy of traffic light target pixels,and dense traffic in congested sections.In view of the above problems,this paper introduces one-shot aggregation feature and diverse branch block to improve YOLOV5 s algorithm.The improved algorithm is applicable to real-time vehicle and pedestrian detection algorithm of high computing power GPU equipment.In addition,this paper combines Ghost Net V2 and re-parameterized depthwise separable convolution for lightweight improvement.Specific improvements are as follows:1.In order to adapt to the application scenario of vehicle and pedestrian detection in unmanned driving,an improved YOLOV5 s algorithm combining one-shot aggregation feature and diverse branch block is proposed.The algorithm introduces an one-shot aggregation feature module in the backbone network to improve the feature extraction ability,which is more concise than the CSP module and the residual connection need to introduce 1 ×1 convolution.YOLOV5 s deletes the fusion node with only one input branch in the PAN structure.Although it is beneficial to reduce the amount of parameters and computation,it causes the loss of details.Therefore,horizontal fusion nodes are added in the shallow layer to increase the proportion of detail information in feature information,and improve the problem of false detection and missing detection of targets such as pedestrians and traffic signs.The re-parameterized multi-branch module is introduced into the Neck structure,and the average pooling and convolution kernel of different receptive fields are used to enhance the feature fusion ability.The prediction is converted to a single branch,which will not affect the reasoning speed.YOLOV5s_OED proposed in this paper improves the m AP value on BDDK100 dataset by 8.1%.2.On the basis of the proposed YOLOV5 s algorithm,which combines the one-shot aggregation feature and diverse branch block improvement,the lightweight improvement is made to enable the model to be deployed on edge devices.The Ghost S module based on Shuffle Net idea and the Ghost module based on attention mechanism are used to replace the ordinary convolution in the original backbone network.The one-shot aggregation module in the backbone network uses the depthwise separable convolution to propose a lightweight backbone network.The number of channels in each layer of the Neck structure increases with the increase of the number of network layers.The improved Neck directly samples the features from top to bottom,and no longer compresses the feature channels.When all layers fuse features,the number of feature channels is compressed to 128,which is conducive to reducing the amount of computation and parameters.In order to prevent the loss of fused features caused by compressed channels,the re-parameterized 1 × 1 convolution and 3 × 3 depthwise convolution improves feature fusion ability.Compared with YOLOV5s_OED,the improved model reduces the amount of parameters by 72%,reduces the amount of computation by 56%,and reduces the m AP value by 4.7% on the BDDK100 dataset. |