With the development of science and technology,image semantic segmentation technology based on deep learning is increasingly widely used in people’s daily life.In the urban road scene,high-precision semantic segmentation technology,which provides important information for intelligent vehicle planning and decision-making,is an important guarantee of vehicle safety.Therefore,based on the DeepLabV3+ model,this paper,focusing on the semantic segmentation method in the road scene,analyzes the segmentation difficulties in the urban road scene image from the model itself,and tests and improves the encoder-decoder structure of the DeepLabV3+ model,which effectively improves the segmentation effect of the model.The specific contents and innovations of this paper are as follows:(1)The methods of model migration and weighted loss function are adopted to solve the problem of lack of training data and unbalanced classification in small sample street view image dataset.Experimental results show that the model trained with the method of model migration can converge faster,and has better robustness and generalization ability;and the model trained by the combination of model migration and weighted loss function training can effectively improve the recognition rate and segmentation accuracy of small-scale target objects,and improve the overall segmentation effect of the model.(2)In view of the large number of feature extraction network parameters for the trunk of DeepLabV3+ model,which is not conducive to practical application,the common convolution neural network is used as the trunk network for feature extraction,and the segmentation performance of DeepLabV3+ model when Res Net50,Res Ne Xt50 and Mobile Net V2 are used as feature extraction network respectively is investigated.By adjusting the output stride,the feature loss in the downsampling process is reduced,and the segmentation effect of the model is effectively improved.When the output stride is adjusted to 8 times,the segmentation effect of the model is the best;and considering the practical application of the model,the experimental comparison shows that when the output stride is adjusted to 16 times,the model has the best balance between the number of parameters,complexity and segmentation accuracy.(3)Aiming at the poor performance of DeepLabV3+ model in the details of small-scale objects and target boundary information,the original decoder structure of DeepLabV3+model is improved.The main work is carried out in the following aspects: First,the local features of the street view image are further extracted from the shallow network by using the global convolution module and the boundary thinning module.The GCN-DeepLabV3+ and GCNG-DeepLabV3+ models with more detailed information have better representation ability in the details of image segmentation.Second,the convolutional block attention module is used to map the local features extracted from the shallow network in the channel and spatial dimensions.In this way,the CBAM-DeepLabV3+ model of attention mechanism can extract more critical local features and improve the segmentation effect on small-scale target objects and boundary details.Experimental results show that the DeepLabV3+ model with improved decoding structure can be better applied to the semantic segmentation of urban road scenes. |