Research On Adaptive Visual Feature Enhancement Based On Knowledge Guidance

Posted on:2023-11-03

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Qiu

Full Text:PDF

GTID:1528306797488674

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Recently,deep convolution neural network(CNN)has become one of the most important technologies in the computer vision and image processing communities,due to its powerful information perception and visual feature extraction capabilities.The visual feature enhancement algorithm is a key means to further improve the feature extraction capability of CNN,thereby improving the performance of the model.Hence,this paper aims at achieving effective visual feature enhancement without changing the overall architecture of CNN.Specifically,this paper focuses on two key issues of adaptive visual feature enhancement based on feature internal knowledge guidance and feature external knowledge guidance.The core technologies used mainly include: attention mechanism,multi-scale feature learning and fusion,dictionary learning,and domain transform.Based on the two visual tasks,i.e.salient object detection and medical image lesion segmentation,this paper first analyses the strengths and weaknesses of these visual feature enhancement technologies.Then,the corresponding technical solutions are proposed,and their effectiveness is confirmed theoretically and experimentally.The main contributions of this paper include:1.Presenting a novel Cubic Information-Embedding Attention(CIEA)mechanism to enhance essential information and filter out noisy information for 3D feature maps.By designing two attention modules,i.e.the Spatial-Embedding Channel Attention(SECA)and the Channel-Embedding Spatial Attention(CESA),CIEA embeds the spatial/channel information to generate channel/spatial attention,thereby avoiding the information loss.Furthermore,with the spatial/channel information embedded,the learned channel/spatial attention can encode the interrelationship between channel and spatial dimensions.Hence CIEA can perceive and model the global information.Finally,these two attention modules are combined into a cubic attention to achieve feature enhancement of 3D feature maps.2.Presenting an Attentive Hierarchical Spatial Pyramid Module(AHSP)module for effective and lightweight multi-scale feature learning and fusion.Besides some common convolution operations,this paper introduces the dilated depthwise separable convolution.Meanwhile,a spatial pyramid composed of the dilated depthwise separable convolutions and feature pooling layers is designed to learn multiscale semantics feature.Then,the learned multi-scale feature maps are gradually fused from small scales to large scales to enhance the representation capability of multi-scale learning.In addition,an attention mechanism is adopted for each scale to adaptively determine the contribution of various scales in the multi-scale learning process.3.Presenting a Knowledge Embedding Module(KEM)to achieve pixel-level feature enhancement by extending traditional dictionary learning.KEM combines the idea of dictionary learning and the end-to-end optimization algorithms of deep learning.Specifically,KEM first uses CNN to encode the knowledge of all the scenarios in the dataset,achieved by making the inherent dictionary differentiable.The encoded knowledge can be viewed as the codewords of the traditional dictionary learning.Then,the learned dictionary is embedded into the deep feature map of the CNN in a pixel-wise manner to construct an enhanced feature map.The universal knowledge in the dictionary would increase the pixel in the enhanced feature map distinguishability.4.Presenting an Edge-Guided Saliencty Refinement Module(ESRe)to increase the detection accuracy near object boundaries.ESRe extends the domain transform technology from the field of one-dimensional signal processing to twodimensional image processing.By feeding into the coarse saliency map and the edge probability map,ESRe performs a convolution operation on the coarse saliency map to calculate the correction value.And then,ESRe refines the coarse saliency map to be better aligned with object boundaries,guided by the edge information which can be seen as the prior knowledge.In this way,ESRe can achieve the enhancement of visual features.Specifically,the edge probability map controls the relative contribution of the coarse saliency map and correction value to the refined saliency map.

Keywords/Search Tags:

Visual feature enhancement, attention mechanism, multi-scale feature learning and fusion, dictionary learning, domain transform

PDF Full Text Request

Related items

1	Video Action Recognition Based On Hybrid Attention Mechanism And Multi-scale Feature Fusion
2	Research On Visual Question Answering Method And System Based On Deep Learning
3	Research On Attention Mechanism And Multi-scale Feature Fusion Method For Object Detectio
4	Deep Learning Based Visual Saliency Detection
5	Low-Light Image Enhancement Via CNN-Transformer Dual-Stream Feature Extraction And Convolutional Dictionary Learning
6	Research On Joint Dictionary Learning Model Based Feature Space For Canonical Domain Expression Recognition
7	Research On Human Pose Estimation Method Based On Multi-scale Feature Fusion
8	A Lightweight Deep Learning Target Detection Algorithm Based On Multiscale Feature Fusion
9	Research On Small Object Detection Algorithm Based On Deep Learning In Complex Scene
10	Research On Low Light Image Enhancement Network Based On Multi-Scale Feature Extraction