Font Size: a A A

Intention-Driving Based Visual Attention Modeling And Application

Posted on:2023-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2532307103485544Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
"The eyes are the windows to the soul",humans obtain information from the surrounding environment all the time,and most of the information that humans obtain from the outside world is through vision.Humans do not treat external visual information equally,but show some specificity.Humans can select from a large amount of visual information that is relevant to the current task for further processing,and simultaneously suppress other distracting information that is not related to the current task goal,which is called the attention mechanism of the visual system.The selective attention mechanism is an important part of the cognitive function of the human brain,which enables the human brain to extract and process important information efficiently with limited time,and is an important research direction of neural network technology.This paper focuses on intention-based visual attention modeling,where intention,as a kind of high-level information,can guide human attention to intention-related regions for the purpose of efficient environment perception.Since CAM(Class Activation Mapping)can generate visual attention heat maps related to target classes,this paper is based on CAM technique for visual attention modeling.Finally,this paper applies the visual selective attention model to traffic scenes to achieve the ability to switch the attention target at any time according to the change of intention.In the process of autonomous driving,the model can selectively go to focus on specific regions and specific targets according to the driving intention,thus having the ability to efficiently acquire environmental information.The paper includes the following two main works:1)The interpretability of the model is a hot issue in the field of computer vision in recent years.This paper proposes a FIMF Score-CAM model that fast integrates multiple features of local space.The model only needs to perform a forward convolution calculation on the image once to extract the feature maps,and then the feature selection template is introduced to integrate the features of different channels in local space to improve the integrity of model interpretation.Finally,the integrated feature maps are used to calculate its weight of the target class,and then the target class visual saliency map is generated by the weighted sum of the feature map.The FIMF Score-CAM model is superior to the existing mainstream models in interpreting the visual performance and fairness indicators of the decision-making,having more complete explanations of target classes and the advantage of fast calculation speed.Meanwhile,in some network models requiring larger convolution calculation,the operation time is reduced by more than 90% compared to Score-CAM.2)Selective attention to specific areas and specific targets according to driving intention is of great significance for autonomous vehicles to efficiently obtain external environment information.In order to achieve efficient environmental perception,we propose an intention-driven visual attention selection model by simulating human active perception of the external environment.Meanwhile,in order to improve the integrity of the target category attention heatmap,a deep network training method with feature region enhancement is proposed.In this paper,FIMF Score-CAM which can fast integrate multiple features of local space is proposed.It generates intention-related target attention map by weighting the feature map extracted by forward convolution calculation,and combines spatial attention and feature attention to improve the ability of target category location.At the same time,the network is forced to pay more attention to the more comprehensive target-related region by using the guided random erasing in training process,which overcomes the deficiency that the model only pays attention to the most discriminative feature region,and achieves the purpose of feature region enhancement.Experiments on KITTI dataset show that the positioning integrity and accuracy of our model are significantly improved compared with other top-down attention models.
Keywords/Search Tags:CAM, Model interpretation, Intention-driven, Selective attention, Feature region enhancement
PDF Full Text Request
Related items