Font Size: a A A

Image Semantic Segmentation Based On Multi-level Feature Fusion And Attention Mechanism

Posted on:2021-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:G H WangFull Text:PDF
GTID:2428330611964278Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Image semantic segmentation is a basic task in the field of computer image,and it is also a key step for the machine to understand the image.The early segmentation methods segment the image into different regions according to the features of the image,which can not recognize the physical meaning of the region.This is not consistent with the way that human beings segment the image through different categories,so it is limited in practical application.With the development of the deep convolution network,the fully convolutional network extends the model used for image classification to the pixel-wise classification.By giving each pixel a category label,the goal of segmenting the image into different category regions is achieved,that is,image semantic segmentation.The research about image semantic segmentation has guiding significance for other machine vision research that also pay attention to image details,such as target detection,super-resolution reconstruction,etc.;it is also widely used in the fields of automatic driving,medical image segmentation,pedestrian detection,etc.The deep convolutional network model based on the fully convolutional network realizes the end-toend image semantic segmentation.The strong semantic abstraction ability of the convolutional neural network is its advantage,which makes the model be able to accurately predict the category of pixels.However,the limitations of the convolutional neural network structure and training process also bring challenges to the image semantic segmentation.The research of image semantic segmentation mainly includes three directions: backbone network for abstracting semantic features,head network for upsampling and improving feature quality,and loss function to ensure better convergence of network parameters.In this paper,the head network and loss function are studied.The main works are as follows:(1)Due to the limitation of the receptive field of convolutional neural network,the features extracted by the backbone may have intra-class inconsistency and inter-class indistinction.Therefore,more context information should be introduced to model the relationship between pixels,so as to obtain more distinctive semantic features.In order to solve this problem,this paper proposes a context attention unit.Through the aggregation of coarse segmentation results and the features of each pixel,a group of category features are obtained.Then,by calculating the similarity between the features of each pixel and each category feature,the coarse segmentation results are updated.Then the updated segmentation results are used to allocate the category features to each pixel.Finally,the attention feature map and the original feature map are fused by summation.(2)The feature map extracted by convolution neural network at different stages has different characteristics.The receptive field of low-level features is small,so only some local features can be extracted.The resolution of low-level feature map is high,and the detail information is more abundant;while the deep features are abstracted by multi-layer convolution,therefore these features can be classified more accurately.In order to combine the advantages of features of different levels,this paper proposes a gate fusion refine unit.Through the gating mechanism,we select the areas that can not be identified accurately by single layer features,and use the additional convolution layer to fuse the multi-layer features as the supplement of these areas.(3)The goal of multi-level feature fusion is to use the characteristics of different levels to improve the accuracy of identifying hard pixels.In order to make the model pay more attention to the recognition of these hard pixels and improve the effect of multi-level feature fusion module,this paper analyzes the distribution of hard pixels in category and region.Then a new loss function is proposed to improve the weight of hard pixels and enable the model to learn parameters with higher segmentation accuracy.(4)Through the experiments on three commonly used data sets in image semantic segmentation,the effect of improving the segmentation accuracy of each module is quantified.At the same time,by comparing the performance and results with other similar algorithms,the superiority of the algorithm proposed in this paper is proved.The visualization of each module shows its impact on segmentation results.
Keywords/Search Tags:Deep Learning, Semantic Segmentatio, Muti-level Feature Fusion, Attention Mechanism, Gate Mechanism
PDF Full Text Request
Related items