| With the development of science and technology and computer applications,artificial intelligence assistive technology is used in various scenarios in society.Lung X-ray images play an important role in the process of lung medical diagnosis and treatment,and can help doctors make more accurate diagnosis of the condition.Accurate segmentation of lung medical images by deep learning methods can help doctors diagnose the causes of diseases more efficiently,reduce the workload of medical personnel,and improve work efficiency.In this paper,we address the semantic segmentation of lung X-ray images based on deep learning,and the main work is as follows:(1)Based on the Swin Transformer model structure and U2-Net model structure,an improved U-Net network structure is proposed to combine U-Net with Swin Transformer.The improved U-Net model structure is divided into two layers of inner and outer structure,the outer layer has a U-shaped structure and consists of six encoders and five decoders,and the encoders and decoders are the inner structure.Five of the encoders and five decoders are symmetrical,and the remaining one connects the symmetrical encoders and decoders.The first two encoders and the last two decoders use the Swin Transformer network to replace the U-shaped structure of the original feature extraction module VGG network of U-Net;the middle two encoders and the middle two decoders use the U-Net model;and the last two encoders and the first decoder use a U-shaped structure consisting of the convolution of voids.Further this network is used as the inner layer by superimposing a certain number of inner modules to the outer structure in U-shape.The improved U-Net deepens the architecture compared to U-Net,and the experimental results show that it can extract multi-scale features more comprehensively and has good results for objects with higher accuracy requirements such as lung medical images.(2)We propose to add a hybrid attention mechanism at the jump connection of encoder and decoder in the outer and inner layers of the improved U-Net model that do not use Swin Transformer module,which can perform secondary feature extraction more accurately and facilitate feature fusion later;and replace all the upsampling structures with transposed convolution for parameter tuning,and replace the encoder with larger feature maps in the early stage The downsampling module between encoders with large feature maps is replaced with depth-separable convolution to reduce the computation while continuing downsampling,and after the feature map becomes smaller after repeated downsampling,the subsequent downsampling module is replaced with null convolution to prevent the loss of important information.(3)Improvement of image preprocessing.Since the lung medical image requires very high segmentation accuracy,some targeted preprocessing is needed for this lung X-ray image,and preprocessing of the image by cropping and Instance Norm normalization is proposed to improve the image segmentation accuracy.Experiments show that the improved U-Net improves the F1 scores by 5.65%,3.44%,and 3.19%,respectively,and the mean absolute errors are reduced by 2%,0.3%,and 0.8%,respectively,and the HD95 is reduced by 6.9%,1.5%,and 0.7%,respectively,compared with U-Net,U2net,and Swin-Unet for training and testing on the same homemade lung X-ray image dataset.The splitting performance was significantly improved. |