So far,a lot of achievements have been made in the field of pedestrian detection for intelligent transportation,but there are still some very difficult problems,one of which is pedestrian detection in complex lighting scenarios.The traditional visible modal image is greatly affected by the lighting conditions,and the visible modal image is clearer and better for pedestrian detection during the daytime when the lighting conditions are good;however,the visible modal image quality is poor in low light conditions,which leads to a serious degradation of pedestrian detection because it cannot provide sufficient and effective pedestrian target information,while the infrared modal image has a positive effect on this aspect of pedestrian information.The infrared modal images have a positive effect in compensating for this pedestrian information.Therefore,this paper focuses on the pedestrian detection problem under mixed modal images,and proposes a multi-scale mixed modal pedestrian detection algorithm for smart transportation based on the YOLOv4 algorithm to achieve pedestrian detection performance under complex road scenes by fusing visible and infrared light modal image features.The research covers three aspects as follows:(1)Introducing a parallel dual-stream feature extraction network based on the YOLOv4 algorithm and proposing two hybrid modal feature fusion methods,channel stacking and self-learning weighted summation.The experimental results show that the detection method using channel stacking fusion improves the average accuracy by 5.07% and reduces the log-average miss detection rate by6.92% compared with the visible modal detection method.(2)A quadruple snowflake transform and pseudo-target embedding are proposed to do data augmentation on the original training dataset,enabling a large number of new samples to be generated in a limited dataset,thus improving system robustness.(3)A feature sharing learning network FSNet is proposed to achieve optimal fusion between mixed-modal features by having two parallel feature extraction networks of different modalities exchange feature matrices with each other to guide each other to learn the commonly needed features from images in a targeted manner,and supplemented with Inception structure and cross-modal channel attention mechanism CMSE to help capture more semantics.Experimental results show that this method reduces the log-average miss detection rate by 2.8% compared with the channel stacking fusion method.In summary,this paper proposes a multi-scale road pedestrian target detection method based on the YOLOv4 algorithm for visible and infrared light mixed modalities,and achieves certain results after extensive experiments and continuous improvements. |