| In recent years,as one of the effective means to solve traffic problems,the Intelligent Transportation System(ITS)has received extensive attention from scholars from all walks of life.Among them,accurate and real-time segmentation of urban dynamic traffic scenes is an important task in the perception link of ITS.At present,the existing segmentation networks are mainly oriented to visible light images.Visible light images play an important role in segmentation due to their rich feature information.However,the segmentation network designed for visible images is very dependent on lighting conditions.In dynamic traffic scenes,they can provide better segmentation results when there is sufficient light,but suffer from insufficient accuracy and robustness in challenging environments such as glare and night.Thermal infrared cameras rely on the thermal energy emitted by the object for imaging,eliminating the dependence on the illumination source and providing stability in imaging.Moreover,the intensity of thermal radiation received by the infrared camera reflects the temperature of the object,and is highly sensitive to major participants such as vehicles and pedestrians in urban dynamic traffic scenarios.In contrast,while thermal infrared images have the advantages of stability,they also have disadvantages such as less information features,cluttered noise,and thermal crossover.To address the above problems,this paper integrates the characteristics of visible and thermal infrared images,combines multispectral image features and global edge features,and investigates a semantic segmentation network with high accuracy and robustness to lighting conditions in dynamic traffic scenes.The main research contents of this paper are as follows:(1)Based on the deep convolutional network research,the semantic segmentation network suitable for visible light RGB images is built(MNet).Based on the encoder-decoder structure,this paper designs a high-performance encoder and a multi-level skip connection decoder.In order to ensure the learning ability and feature extraction speed of the encoder,this paper uses Res Net optimized by asymmetric convolution as the encoding structure.Aiming at the problem that continuous downsampling leads to a large amount of loss of information and mismatch of encoding and decoding features,this paper designs a multi-level skip connection structure for the decoder.This structure breaks the inherent limitation of single-layer skip connections,enables flexible feature fusion and preserves necessary information.Finally,the experiment verifies the performance of MNet and compares it with the results of mainstream single-modal networks.The results show that MNet has the best segmentation accuracy and is also competitive in inference speed.In addition,the experiments also show that the network segmentation effect needs to be improved when the lighting conditions of dynamic traffic scenes are poor.(2)Based on the edge feature research,the semantic segmentation network for thermal infrared images is built(MNet-E).Based on the MNet network,this paper uses thermal infrared images as data sources to study a semantic segmentation algorithm robust to challenging lighting in dynamic traffic scenes.Aiming at the shortcomings of thermal infrared images,this paper uses global edge features as prior knowledge to construct an edge optimization module.This module optimizes segmentation results by deeply fusing global edge features and multiple decoding features.Finally,the performance of MNet-E is tested by experiments and compared with the mainstream single-modal segmentation network.The experimental results show that MNet-E has better overall segmentation performance on thermal infrared data,and the edge optimization module can significantly improve the segmentation accuracy of the network.Although the segmentation methods based on thermal infrared images play a significant role in improving the segmentation accuracy of challenging lighting scenes,the overall segmentation performance still needs to be improved.(3)Based on multispectral data fusion,an accurate and robust semantic segmentation network in dynamic traffic scenes is built(MDFNet).In order to further improve the accuracy and robustness of segmentation,on the basis of the established network,this paper comprehensively considers the characteristics of visible light and thermal infrared images,and achieves data complementarity through deep feature fusion.In order to obtain fusion features with stronger representation ability,this paper adopts the dual-branch coding structure to extract features from visible light and thermal infrared images respectively,and uses the SE strategy with targeted learning advantages to fuse and optimize the coding features.The fused features are used as input to the edge detection structure and decoder to improve the performance of edge detection and segmentation.Finally,this paper validates the segmentation performance of MDFNet and compares it with the classic single-modal and multi-modal segmentation networks.The experimental results show that the MDFNet has the best performance in segmentation accuracy and segmentation edge processing,and can better balance segmentation accuracy and real-time performance.In addition,experiments also show that the method based on multispectral data fusion can significantly improve the segmentation accuracy and enhance the adaptability to challenging lighting scenarios compared to the single-modality method. |