Font Size: a A A

Image Semantic Segmentation Based On Self-attention Mechanism And Encoding-decoding Network

Posted on:2022-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhouFull Text:PDF
GTID:2518306737456434Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Image segmentation technology has become a research hotspot in the field of computer vision.Semantic segmentation is used in many fields,such as autonomous driving,medical,new retail,and so on.In order to ensure that the image segmentation technology can be correctly applied in the corresponding field,it is very important to accurately segment the area of each type of object in the picture.In traditional image segmentation algorithms,analysts use their own hand-designed features,and then construct a classifier for classification.However,manually extracting features is time-consuming and complicated,so this paper uses deep learning-based image semantic segmentation to achieve automatic extraction of image features and complete the correct segmentation of each type of object in the image.A large number of literature reading and experimental verifications have found that capturing contextual information at multiple scales is an effective method to improve the accuracy of semantic segmentation,especially the Atrous Spatial Pyramid Pooling(ASPP)module.Because convolution kernels of different sizes have different focus on capturing image features,when fusing multi-scale information,it cannot be achieved simply by splicing,By contrast,it should be given different weights when fusing different scales.Therefore,Euclidean distance is introduced as an attention mechanism into the ASPP structure,and the importance of different feature maps is calculated through Euclidean distance.In addition,in the decoding stage,the high-and low-dimensional feature maps are reorganized to make up for the detailed information lost in the down-sampling process.Therefore,the semantic segmentation network based on the attention mechanism and encoder-decoder is proposed.When the base network Res Net50 is selected,the m Io U of 73.45% and64.27% are obtained on the two data sets of PASCAL VOC2012 and Cityscapes respectively.Although multi-scale capture of contextual information can improve the accuracy of semantic segmentation,this multi-scale is essentially a fusion of local features.In order to capture long-distance context information,inspired by the DANet network,a vertical and horizontal compression attention module is proposed,which has a small amount of calculation and higher accuracy than the position attention module in the original DANet network.In the decoding stage,previous work used global average pooling to act on the high-level feature map to generate a weight vector to guide the selection of low-level feature details.The difference is that this paper uses different scale pooling structures for high-level feature maps to perform feature compression to extract weight vectors,and use this weight vector to guide the extraction of spatial details of low-level feature maps.Therefore,the semantic segmentation network based on self-attention feature fusion is proposed.When the base network Res Net50 is selected,the m Io U of 76.42% and 73.13% are obtained on the two data sets of PASCAL VOC2012 and Cityscapes respectively.In order to compare with the previous method,experiments were also carried out in Res Net50.The results show that the method improves by 8.86% on the Cityscapes data set.
Keywords/Search Tags:semantic segmentation, multi-scale features, attention mechanism, encoder-decoder structure
PDF Full Text Request
Related items