| Infrared small target detection is the key technology of infrared searching and tracking system,and its performance determines the response time for early warning systems or precision guidance systems.Limited by the characteristics of infrared imaging and the needs of long-distance imaging,the infrared small target is scarce of intrinsic features.Besides,due to the heavy background clutter and complex image scenes,despite years of development,infrared small target detection still faces huge difficulties and challenges in terms of accuracy and robustness.To tackle these issues,this dissertation has carried out related research along with the development context of model-driven methods to data-driven methods and then model-driven deep learning.First,a low-rank plus sparse decomposition model is built to more accurately describe the infrared small target.Then to explore new attention mechanisms and their more diverse applications in deep networks,several pixel-level annotated small target datasets are constructed.Finally,an end-to-end model that combines the deep neural networks and traditional modeldriven methods is proposed for infrared small target detection.Detailed contributions are summarized as follows:(1)Aiming at the problem that the existing sparse constraints cannot effectively distinguish the real target from the same sparse strong edge residue,a reweighted infrared patch-tensor model is constructed to selectively suppress sparse elements by applying different weights to different image contents.First,to dig out more information from the non-local self-correlation property in patch space,a new infrared patch-tensor model is constructed and the separation of small targets and background is modeled as a tensor robust low-rank restoration problem.Secondly,based on structure tensor,element-wise local structure weights are designed to replace the original global parameter,so that the model can adaptively adjust the shrinkage thresholds according to the local structure weights in iterations,thereby reducing the false alarms caused by the relatively sparse background components.Finally,a new stopping criterion is designed according to the sparsity of the target patch-tensor,and the sparsity enhancement weight is used to reduce the iteration rounds.Compared with other low-rank plus sparse decomposition methods,this model can greatly improve the detection speed while better suppressing the complex cloud clutter.(2)Model-driven infrared small target detection methods suffer from the problem of insufficient discriminative ability and the sensitivity of hyperparameters to image contents.To tackle these issues,an asymmetric bi-directional attentional modulation network is designed to automatically learn the semantic features of infrared small targets in an end-to-end manner.First,a single-frame infrared small target detection benchmark dataset is constructed and annotated in five different forms,followed by a statistical analysis of the characteristics of the infrared small target.Secondly,to overcome the contradiction between feature resolution and semantic level,an asymmetric bi-directional attention modulation mechanism is proposed to achieve a cross-layer exchange of high-level semantic information and target detail information.The top-down modulation pathway adopts the global attention module,which is used to feedback the semantic information of high-level features to the low-level features,and encodes the target context;the bottom-up modulation path uses the local attention module to embed the details of the low-level features in the high-level features.Compared with the traditional modeldriven methods,the proposed network can significantly improve the performance of infrared small target detection.(3)Inspired by the similarities between the attention mechanism and the activation function,a novel type of activation units called attentional activation units are proposed,which can selectively activate features based on contextual information in a layer-wise manner.To meet the locality requirement,attentional activation units are a series of lightweight attention modules that only aggregate local feature contexts,e.g.,the local channel attention module.By replacing the original activation functions in the network with the attentional activation units,a fully attentional network can be constructed,which can encode high-level semantics more efficiently since irrelevant features are suppressed in early stages.Besides,to verify the small target detection methods on a larger-scale dataset,a dim iceberg detection dataset is constructed which shares the similar characteristics of infrared small targets.Ablation experiments and comparative experiments on multiple computer vision tasks show that,given the same host network,the attentional activation unit can greatly improve the performance of various networks compared to other activation units.(4)To tackle the scale inconsistency issue of feature fusion in deep networks,a uniform and general framework called attentional feature fusion is proposed,which is applicable for most common scenarios,including feature fusion induced by short and long skip connections as well as within Inception layers.To better fuse features of inconsistent semantics and scales,a multi-scale channel attention module is constructed,which addresses issues that arise when aggregating feature contexts given at different scales.It is also demonstrated that the initial integration of feature maps can become a bottleneck and that this issue can be alleviated by adding another level of attention,which is referred to as iterative attentional feature fusion.Given a comparable number of parameters,models with attentional feature fusion outperform state-of-the-art networks on multiple datasets,which suggests that more sophisticated attention mechanisms for feature fusion hold great potential to yield better results compared to their direct counterparts.(5)To tackle the issue of minimal intrinsic characteristics,a novel model-driven deep network named attentional local contrast network is proposed for infrared small target detection,which combines discriminative networks and conventional model-driven methods to make use of both labeled data and the domain knowledge of local contrast prior.With the feature map cyclic shift trick,the modularized local contrast measurement method,as a nonlinear feature transformation layer with the specific physical mechanism,can explicitly break the limitation of the effective receptive field and capture the interaction between local features and their regional contexts.To highlight and preserve the subtle information of small targets,a bottom-up local attentional modulation module is adopted to dynamically encode the smaller scale details into high-level feature maps.From the perspective of the model-driven methods,the attentional local contrast network replaces the simple features such as mean and maximum values with the semantic features learned from the annotation data,thus greatly improving the performance of detecting infrared small targets. |