| Cardiovascular disease is a common disease and one of the main causes of death worldwide.Therefore,it is important to automatically extract and segment pathological regions in cardiac images for the diagnosis and treatment of cardiovascular diseases.With the increasing use of deep learning in image processing,medical image segmentation technology has made rapid progress,significantly improving the accuracy and effectiveness of disease diagnosis and treatment.However,traditional convolutional neural network(CNN)models have fixed-size receptive fields.This leads to weak feature extraction ability and unsatisfactory segmentation performance of CNN-based segmentation methods.Moreover,due to the complex structure o f cardiac images and the adhesion problem between different sub-regions,it is difficult for the model to segment the target region boundaries accurately.In addition,in the actual diagnosis process,different modalities of images play a complementary role,and it is difficult to provide complete information using a single modality.To address these issues,this paper combines the characteristics of multi-modal cardiac MRI images and designs a deep learning-based multi-modal cardiac MRI image segmentation algorithm to improve the segmentation performance of the model on multi-modal cardiac MRI images.Specifically,the research work of this paper is as follows:(1)To address the problem of insufficient utilization of image multi-modal information in cardiac MRI segmentation by classical network,this paper proposes a multi-modal cardiac MR image segmentation model,namely NVTrans-UNet.The model first subjected multi-modal cardiac MRI images to data enhancement,to enhance the diversity of the data.And by designing hierarchical neighborhood visual Transformer in the encoding stage,feature embedding and downsampling are performed using overlapping small convolutional kernels,which focus more on local information with lower complexity.Second,a multi-modal gated fusion network is introduced into each Transformer layer of the encoder,which fuses the feature maps of task-relevant information in various modalities,and automatically learns to adjust the contribution weights of the feature maps of different modalities,it helps to fully utilize multi-modal information and enhance feature representation ability.Finally,adding a bottleneck layer with atrous spatial pyramid pooling between the encoder and decoder,which can accurately capture information at different scales,increasing the ability to represent detailed features and segment small target regions.It was demonstrated that the NVTrans-UNet model significantly improved the segmentation of pathological regions of cardiac images and showed significant improvements in evaluation metrics.(2)To solve the problem that the classical network is difficult to capture the long-distance dependence in the image and fuzzy boundary segmentation of cardiac images.To address this problem,a combining global and local attention mechanism model called GCT-UNet is proposed.The GCT-UNet model first utilizes a combination of global and local attention modules,to effectively model long-distance and short-distance spatial interactions.This helps to extract richer and more discriminative features,as well as to enhance the correlation between features.In addition,optimization of the network designing the mixed loss function allows the model to alleviate the class imbalance while focusing on the boundaries of the pathological regions.This method helps to improve the accuracy and completeness of the segmentation results.Experiments have shown that the GCT-UNet model effectively improves the segmentation accuracy of pathological regions of multi-modal cardiac images and shows significant improvements in major evaluation metrics. |