| The segmentation of infrared small targets has received increasing attention due to its important applications in target surveillance,target search,object recognition,and other fields.However,due to the characteristics of infrared remote imaging,the target of interest in infrared images is accompanied with few pixels,complex and varied backgrounds,strong clutter interference,and unclear feature information.Therefore,it is extremely difficult to achieve precise segmentation of infrared dim and small targets in complex backgrounds based on traditional feature engineering methods.The breakthrough in deep learning technology has brought new hope for solving this problem.This thesis conducts in-depth research on the segmentation of dim and small targets in infrared images in complex scenes based on deep learning methods.In order to more targeted segmentation of dim and small targets in complex and diverse image scenes,this study first designs the classification method for complex scene images to distinguish different scene categories,and then further designs the segmentation method for dim and small targets to adapt to the extraction of local information of interest in images of different scene categories.The main research work of this thesis is summarized as follows:(1)For scene images with diverse target types and complex spatial relationships,differences in local backgrounds or small-scale targets can affect scene representation.However,existing deep convolutional networks ignore the research on spatial correlations of different orders oriented towards local semantic expression in scene images,which is insufficient to effectively extract discriminative scene features.Therefore,an image scene classification method based on adaptive association of spatial local features is proposed in this thesis.Firstly,multi-scale spatial features are extracted using deep convolutional networks,and then the semantic features of local regions in spatial features are enhanced using dual attention.At the same time,the local adaptive aggregation learning mechanism for attention enhancement is designed from different angles,more detailed local semantic expressions and complex spatial relationships among local features in each scale space is mined,and finally an attention mechanism for multi-scale sparse fusion is designed to extract discriminative image scene features by fusing local relational aggregation features from different scale spaces.(2)For complex scene images where the target and its background are diverse in viewpoint,scale,shape and distribution form,existing deep convolutional networks lack the study of diversity enhancement relationship of spatial feature,and thus it is difficult to extract consistent scene features effectively.To this end,two different forms of adaptive learning methods of diversity spatial relationships for image scene classification are proposed in this thesis.Specially,a region-based diversity association learning approach is proposed from the perspective of collaborative fusion of spatial region diversity dependencies,which can not only mine arbitrary directional dependencies of regions in different spatial dimensions,but also can learn long-range strip-context dependencies of regions in four directions.Among them,different attention mechanisms are also designed to refine the spatial region relations of diversity.In addition,a global-local spatial associative learning approach is proposed from the perspective of adaptive fusion of spatial global-local diversity dependencies,which can excavate the multiple depth dependencies among global attention features in multi-layer and multi-scale spatial features.At the same time,these multi-layer and multi-scale spatial features are gradually fused in different dimensions and enhanced locally,and then the local semantic dependencies of the space are mined in the aggregated features.(3)Small targets in infrared images have the characteristics of few available pixels,lack of effective information,and variable scales.Existing deep networks often use multi-layer spatial feature fusion to extract features of small targets,but the effective modulation of local information of small targets in multi-scale space is often neglected during the fusion process.To this end,two different forms of dynamic modulation pyramid aggregation methods for infrared small target segmentation are proposed in this thesis.Specially,an asymmetric attention-guided pyramidal aggregation approach is proposed from the perspective of asymmetric multi-path feature fusion.First,the attention-enhanced multi-path pyramid structure is constructed to extract information of small targets in the multi-scale space of different layers.Then,asymmetric local modulated multi-path inverted pyramid structures containing pair-wise associations and recursive associations are constructed.This inverted pyramid structure can dynamically highlight the local details and local semantics of small targets in the multi-scale space by modulating the adjacent cross-layer features asymmetrically.In addition,a bi-directional symmetric attention modulation pyramid aggregation approach is also proposed from the perspective of symmetric multi-path feature fusion.First,a bidirectional symmetric modulation structure is designed that cascades the channel dimension and spatial dimension in sequence,which can aggregate pyramidal features from global to local perspectives sequentially and hierarchically,and realize the details and semantics of small targets to dynamically interact and aggregate in multi-scale spaces of different layers.Then,the similarity between local information and multiple different scale contexts in this aggregated feature is learned to further highlight the local information of small targets.(4)The visualization features of dim and small targets in infrared images are not obvious and are similar to background,noise and other interferents.Most existing deep network methods exploit attention models to enhance discriminability of the feature,but ignore the local enhancement of small targets and the association and comparison of contexts at multiple granularity levels,thus making it difficult to distinguish small targets from similar interferents effectively.To this end,a multi-granularity feature collaborative interaction method is proposed for dim and small target segmentation.First,local patch contextual embedding module in different network layers is designed to enhance the features of dim and small targets.Second,three cross-layer collaborative attention embedding modules with different granularities(i.e.,point-level,region-level,and global-level)are designed in a parallel manner to enhance the discriminative of dim and small target features by interactively aligning the information of the respective granularities in different layer features.Then,a hierarchical bidirectional comparison mechanism of "point-region-global" in spatial location is designed to distinguish small targets from suspected interferers by exploring the association and differences among the contextual information of these three different granularities.Finally,a deeply supervised multi-objective joint learning strategy is designed to improve the consistency of distribution of aligned features across layers and to assist feature learning of small targets on different layers of space.In order to verify the effectiveness of the methods(1)-(4)proposed in this thesis,comparison experiments are conducted with existing related methods on several public image datasets,and the experimental results show that the proposed method in this thesis can effectively improve the performance of image classification and small target segmentation for complex scenes.Moreover,real infrared images of different marine environments are acquired for application experiments,and the experimental results show that the proposed method can be effectively applied to specific scenarios such as infrared small target segmentation in complex marine backgrounds. |