Font Size: a A A

Image Semantic Segmentation Algorithm Based On Deep Learning And Attention Mechanism

Posted on:2023-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:S L XiangFull Text:PDF
GTID:2568307103485604Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
In recent years,thanks to the powerful feature extraction capability of deep learning,it has made significant breakthroughs in many computer vision tasks.As an important research direction in the field of computer vision,image semantic segmentation based on deep learning has always attracted much attention and attracted the research of a large number of scholars.The current research point of image semantic segmentation mainly lies in how to improve the accuracy,speed and domain adaptability of the algorithm.The current algorithm network has achieved considerable results,but due to the high accuracy,robustness and real-time requirements of the algorithm,the field of image semantic segmentation Many challenges remain.Therefore,this paper proposes a deep learning network based on attention mechanism and Transformer,aiming at the shortcomings of existing deep semantic segmentation networks in acquiring global contextual information and multi-scale representation of features.The traditional semantic segmentation model can only provide a small range of contextual information.In order to capture long-distance dependencies,an adjacent location attention module is proposed,which focuses on the dependencies between pixels and adjacent pixels in the feature map.The combination of adjacent position attention and channel attention module forms a new dual attention model,which is lighter and more effective than previous networks.In the semantic segmentation network,low-level features have more location and detail information,and high-level features have rich semantic and category information.In order to effectively use high-level and low-level feature information to improve segmentation performance,a cross-dimensional interactive attention model is proposed,which captures the dependencies between different dimensions in feature map through dimension interaction,and uses cross-dimensional interactive attention model fusion in the decoder.The features of different stages make the semantic information and detail information stand out well.Based on this,a multi-attention mechanism for image semantic segmentation is proposed,and it has been effectively verified on two benchmark datasets,PASCAL VOC2012 and Cityscapes.However,the attention modeling in most attention-based semantic segmentation networks is based on high-dimensional feature maps,which does not fundamentally solve the acquisition of long-range context information.To solve this problem,a visual Transformer is introduced,which has excellent global expression ability.But most decoders of Transformer image semantic segmentation networks use a fixed single-scale window for modeling,ignoring the effect of different size windows on model performance.In order to build multi-scale information modeling,a dynamic multi-scale window Transformer is proposed,which can adaptively capture context information at multiple scales.In order to make low-level pixel-level features render high-level semantic-level features,a bottom-up refinement feature pyramid Transformer module is proposed.Based on this,a multi-scale Transformer for image semantic segmentation is proposed.And it has been effectively validated on Cityscapes,ADE20 K and PASCAL VOC2012 datasets.
Keywords/Search Tags:Semantic segmentation, Deep learning, Attention, Transformer, multi-scale representation, Encoder-decoder
PDF Full Text Request
Related items