Font Size: a A A

Research Of Weakly Supervised Image Segmentation Method Based On High-resolution Class Activation Map

Posted on:2021-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2428330611999986Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image segmentation is one of the core research issues in the field of computer vision and image processing.Semantic image segmentation,as one of the important research branches,its goal is to classify each pixel in a given image into a known semantic category.In recent years,semantic image segmentation methods based on deep convolutional neural networks have achieved rapid development,and fully supervised methods using pixel-level annotation have achieved good performance on multiple public data sets.However,the good performance of these methods especially depends on a large amount of manually labeled image segmentation data.Due to the high cost of manually labeling pixel categories,it is often difficult to obtain large amounts of pixel-level labeling data in practical applications.In order to reduce the model's dependence on full-pixel annotation data,weakly supervised semantic image segmentation methods using sparse annotations(such as target bounding boxes,picture categories,etc.)have received increasing attention.At present,for the problem of weakly supervised semantic image segmentation based on image category labels,most mainstream methods adopt a two-stage training framework to solve the problem of lack of supervision in training.In this framework,the first stage uses the category labels of the image to train an image classification model to extract the rough localization of the target object in the image;in the second stage,the rough localization information is converted into the pseudo-pixel segmentation labels of the image to train a semantic segmentation model.Among them,the class activation mapping(CAM)method is often used to extract the attention map of the semantic target in the image from the image classification model to reflect the approximate positioning of the target.The quality of the target attention map will affect the accuracy of pseudo-pixel segmentation annotation,and thus affect the training effect of the semantic segmentation model.Therefore,how to improve the power of the CAM method to extract high-quality target attention map has become an important issue.An obvious problem with the current CAM-based pseudo-pixel annotation generation method is that because the CAM method needs to extract the target attention map from the high-level feature map output from the backbone network,the resolution of the target attention map is usually very low.Considering the complex changes in the shape,color,texture and other characteristics of the target object in the actual image scene,when using a low-resolution CAM target attention map to locate the target area in the image,it is usually not possible to obtain a finer target capture.In order to improve the performance of semantic segmentation based on CAM method,this paper proposes an improving idea to expand the resolution of CAM target attention map.Inspired by the multi-scale feature pyramid model,this paper first proposes a novel method based on multi-scale feature fusion.In order to further improve the fine localization ability of large-scale CAM target attention maps,the subsequent improvement method incorporates feature enhancement based on the attention mechanism.In order to generate high-resolution CAM of image size,the final improvement method abandons the traditional CAM extraction method based on fixed calculation mode,and adopts an end-to-end CAM generation model based on an encoder-decoder network.Experiments on CAM generation and segmentation network training for three improvement methods show that the method based on highresolution CAM generation effectively improves the performance of the model on weakly supervised semantic image segmentation tasks compared with the method based on traditional CAM extraction.
Keywords/Search Tags:class activation map, encoder-decoder network, weak supervision, semantic image segmentaion
PDF Full Text Request
Related items