| Multi-focus image fusion technology is an important branch of multi-source information fusion,which expands the depth of field of an optical lens by deep fusion of a group of partially focused images in the same scene.The subject will show complete edge and structure information at the appropriate depth of field,but considering the changes in depth of field and object distance,it is still a difficult task to fully present the visual information of all the objects of interest on the same image.Conventional fusion methods,such as transform domain or spatial domain methods,have simple fusion strategies,but are more complicated in technical implementation and are prone to artifacts or blocking effects.With the gradual rise of computer hardware and theory,deep learning is widely used in many research fields.Therefore,the use of deep learning for image fusion has also become a hot topic.However,most of the current deep learning-based methods need to use a large number of datasets with labeled images for learning,but in the multi-focus image fusion algorithm,there is no large number of multi-focus natural image datasets for model training,and they are all hand-made data.Therefore,collecting and producing a large number of datasets has become a difficult and laborious task.In addition,the large dataset has a high demand on the hardware of the training model and a high time cost in the training time of the model.To solve the above problems,this thesis designs two kinds of unsupervised deep learning models for multi-focus image fusion tasks based on the theory of unsupervised deep learning.(1)Based on the characteristics of dense connection network(Densenet),this thesis proposes an unsupervised dense connection network model(Multi-scale Convolutional attention Residual Network,MCRD-Net)for multi-focus image fusion.In our network model,multi-scale feature extraction module is used to extract spatial details of source images from different scales,convolutional block attention module is used to select useful depth features,and residual module is used to optimize network performance.By introducing these three modules,the network can effectively extract the shallow and deep features of source images.In addition,we use Gaussian-based Sum-Modified-Laplacian(GSML)to calculate the activity level of the feature map to generate the decision map.We analyzed the performance of our proposed method from two aspects: visual quality and objective metrics.Experimental results show that compared with nine image fusion methods,the performance of this algorithm is better.(2)Based on the characteristics of Transformer and U-Net,this thesis proposes an unsupervised network model(Unsupervised Transformer Unet Fuse Network,UTUFuse-Net)for multi-focus image fusion.U-net has achieved great success in medical image segmentation,but due to the inherent limitations of convolutional operations,U-Net is difficult to model long-distance dependencies.Transformer is a method designed for sequence-to-sequence prediction with a global self-attention mechanism that effectively captures global dependencies and low frequency spatial details.Our model combines the advantages of Transformer and U-Net to extract image features effectively.A large number of experiments show that our method is superior to the most advanced methods in terms of subjective visual effects and quantitative metrics. |