Image semantic segmentation is a crucial topic in the field of computer vision as it classifies each pixel in an image based on its semantic information.This technology is widely used in autonomous driving,intelligent healthcare,geographic information systems,and precision agriculture,among other fields.Traditional methods of semantic segmentation only obtain low-level feature information from images.However,convolutional neural networks provide more precise analysis of image features and richer semantic information.Therefore,semantic segmentation methods based on deep convolutional neural networks have received widespread attention and application.Among them,the DeepLabv3+ model has shown good segmentation performance,but still has the following problems: 1)It cannot effectively handle the anisotropy problem of objects in real scenes,and cannot perform effective segmentation on objects with large scale differences in the same scene.2)There is a problem of losing boundary position details in the process of feature extraction by deep convolutional neural networks,which affects the segmentation performance of the model.3)The upsampling method of two consecutive bilinear interpolation is too simple,and interpolation methods that are independent of data information have insufficient restoration capability for important information,resulting in poor segmentation performance.To address the above problems,this paper proposes improvements based on the DeepLabv3+ model,specifically:(1)To address the diversity of object scales in real scenes,the paper introduces the CrossAtrous Spatial Pyramid Pooling(C-ASPP)module to capture local and distant feature information through narrow and long pooling kernels.This module can improve the model’s ability to capture global contextual information and reduce information pollution from irrelevant regions when dealing with irregularly shaped objects.(2)To address the problem of losing boundary detail information,this paper proposes the Boundary Enhancement(BE)Module,which enhances boundary features of shallow feature maps that contain abundant boundary details.The enhanced feature map is then processed with convolutional attention to improve the model’s sensitivity to object edge features,thereby mitigating the problem of lost object boundary information.(3)To address the upsampling problem,this paper proposes a Residual Feature Fusion(RFF)Module to fuse shallow and deep feature maps,extract information at different scales,and reduce information loss.Placing this module in the middle of two four-fold upsamplings improves the restoration problem of important feature information,thereby improving the pixel classification performance.In order to validate the segmentation performance of the proposed method,this paper conducted relevant experiments on publicly available datasets PASCAL VOC 2012 and ADE20 K.Comparative experimental results demonstrate that the proposed model method outperforms the DeepLabv3+ method and some mainstream deep neural network-based image semantic segmentation methods in terms of model parameter quantity,computational complexity,and overall segmentation accuracy.Compared with the Transformer-based method,the proposed method achieves similar segmentation performance with fewer model parameters.These comparative experiments provide evidence for the practical value and high segmentation performance of the proposed method.In addition,the results of ablation experiments validate the effectiveness and rationality of each module proposed in this study. |