| With the continuous development of deep learning and artificial intelligence technology,some sophisticated computer vision technology has gradually entered every aspect of people’s life,such as face recognition,speech recognition and automatic driving.Among them,automatic driving is a technology closely related to people’s future life,and the semantic segmentation of images is an important and essential work throughout the automatic driving technology.Because the semantic segmentation of image is different from target detection and image recognition,semantic segmentation is mainly pixel level recognition,which has a more detailed analysis of the image,which cannot be achieved by target detection and image recognition.After a more detailed analysis of the image,the semantic information can be transmitted to the following work to complete the autopilot technology.Therefore,the semantic image segmentation technology is of great significance to realize automatic driving and intelligent road condition analysis.In order to obtain a more accurate segmentation network for road traffic scenes,an adaptive up-sampling full convolution network is proposed in this paper,aiming at the disadvantages of classical networks such as FCN and Seg Net.The multi-channel expansion convolution feature fusion module in the network solves the problem of object scale transformation in the road traffic segmentation scene.Meanwhile,expansion convolution can increase the range of the receptive field,which is more conducive to the correct segmentation of nearby larger objects.At the same time,the adaptive up-sampling module at the end of the network replaces the traditional deconvolution or linear interpolation.The combination of attention mechanism makes the network understand the image better in the decoding process,and gives different weights to different objects in the current scene,which makes the network training more focused.In order to obtain a network with better performance than the adaptive up-sampling network,the feature fusion module is improved on the basis of Exfuse network,and a multistage feature fusion module is proposed.By using convolutional layer,this module first aligns the features of different levels and then fuses them,thus reducing the introduction of noise.This module also USES multiplexing convolution to expand the receptive field.In order to further refine the boundary contour of the object after segmentation,a boundary refinement module is proposed to improve the segmentation effect.Finally,in the decoder part,based on the adaptive up-sampling module,an enhanced up-sampling module is proposed,that is,two attention networks are combined together through similar residual connection,so as to obtain a more efficient attention network to better understand the image and train different objects in the current scene with more emphasis. |