| Image segmentation is a vital research direction in the field of computer vision.With the increasing emphasis on health,medical image segmentation has become a popular research branch of image segmentation.Faced with a variety of medical imaging technologies,the traditional expert manual diagnosis method consumes a lot of manpower and material resources but has limited effect.It is particularly important to develop a computer-aided diagnosis system to help clinical prevention,diagnosis and segmentation of potential lesions.Most of the traditional image segmentation methods are based on a priori rules,such as shape,texture,boundary,etc.,and the results obtained when they are applied to complex medical image segmentation tasks are unsatisfactory.With the rapid development of deep learning,many deep learning methods have been proposed to perform medical image segmentation tasks.Compared with traditional segmentation methods,deep learning methods have achieved satisfactory results when dealing with medical image object blur and scale changes.To better solve the problems of multi-scale variation,noise interference,and rough results in medical image semantic segmentation,this paper proposes feasible model improvement schemes based on the classical encoder-decoder network structure,and achieves better segmentation performance and robustness.The main work of this paper is summarized as follows:1)In order to solve the problems of rough prediction results and poor feature representation in medical image segmentation,this paper studies a segmentation structure that combines encoder-decoder network and attention mechanism.First,the U-Net network structure is introduced to both extract target high-level semantic features by down-sampling operations in the encoder stage,and recover high-resolution segmentation details via up-sampling and skip connections in the decoder stage.Secondly,the convolution block attention module is introduced,which uses the attention mechanism from the channel dimension and the space dimension respectively,focusing on the channel that contributes greatly to the segmentation result and feature-dense spatial area,and filtering out the segmentation background and noise interference.Finally,a segmentation model that combines the encoder-decoder network and attention mechanism is proposed to optimize the intermediate feature maps from the channel and spatial dimensions to obtain more accurate and detailed segmentation results.The experimental results on four medical image datasets show that compared with the benchmark model U-Net,this model achieves a more accurate segmentation effect and has a stronger feature representation ability.2)In order to solve the problem of multi-scale transformation and insufficient utilization of spatial information faced by medical image segmentation,this paper proposes a multi-scale segmentation model combining aggregate connection and attention mechanism.First,we use an aggregate connection strategy,although the encoder-decoder structure uses skip connections to help recover high-resolution details,there may be semantic gaps in the images at the corresponding stages of the encoder-decoder.Aggregate connection can bridge the semantic gap and fuse feature information of different scales and depths,which is beneficial for recovering prediction details.Secondly,the multi-channel convolution module is used.we expand the serial convolution structure with residual connections to a multi-channel parallel structure.Multiple channels complement each other and provide different spatial information to help the model maintain segmentation accuracy in multi-object or multi-scale situations.Finally,the novel model is proposed by fusing aggregate connection,multi-channel module and attention mechanism in the encoder-decoder structure.Comparing experiments with multiple segmentation networks on four medical image datasets,the experimental results show that the proposed model has better segmentation performance and stability in response to problems such as multi-scale variation and noise interference.3)In order to solve the inherent limitations caused by the limited receptive field of the CNN model and overcome the shortcomings of the CNN model’s poor ability to build longrange dependencies,this paper proposes an encoder-decoder segmentation model that combines Transformer and CNN.First,using a CNN-Transformer hybrid encoder structure,both the ability of CNN to extract semantic features and recover high-resolution details,and the Transformer self-attention mechanism to facilitate the establishment of long-range dependencies.Second,a convolutional block attention module is introduced to optimize intermediate feature maps from both channel and spatial dimensions.Finally,a segmentation model combining Transformer and CNN is proposed.The model follows the design of the encoder-decoder structure,uses a CNN-Transformer hybrid encoder,and fuses the convolution block attention module to help segmentation.The comparative experiments on the Synapse multi-organ segmentation dataset demonstrate the effectiveness of the proposed model,and the combination of Transformer and CNN’s encoder-decoder model can obtain accurate medical image segmentation results. |