| Colorectal cancer(CRC)is a common adenocarcinoma that mainly occurs in areas such as the colon or rectum.At least 80%-95% of colon cancer evolves from adenomatous polyps.Colonoscopy is the most effective method for detecting polyps in clinical practice,but manual detection is not only time-consuming and labor-intensive,but there may also be cases of missed detection and misdetection.Therefore,using deep learning technology to accurately segment images of colorectal polyps and assist doctors in observation is helpful for the early diagnosis of colorectal diseases.However,the existing deep learning segmentation networks have a large number of parameters and long inference time,and their efficiency is not high in practical applications.Therefore,this article conducts further research in this area.Firstly,in response to the complex structure and large number of parameters of traditional segmentation models,this paper proposes a Lightweight Segmentation Network(MVT)based on Mobile Vi T.Unlike traditional segmentation networks,this network combines the advantages of convolution and Transformer to extract both local and global features simultaneously.In order to reduce the amount of parameters and computation,this article uses deep separable convolution to replace the traditional convolution module,and combines the inverted residual structure and Si LU function to form an MN module.At the same time,utilizing the Transformer model’s ability to represent global information of images,an MVT module combining convolution and Transformer structure was used.The network uses the combination of MN module and MVT module as the encoder for feature extraction,and then uses the bilinear interpolation method as the decoder to successively conduct up sampling.At the same time,the encoder feature map is fused,and finally the segmentation result is output.This article compares the effectiveness of MVT modules with different layers in segmenting colonoscopic polyp images,and selects the optimal structure for experimental comparison with existing segmentation models.The results show that the proposed algorithm outperforms existing models in terms of segmentation accuracy while maintaining a small number of parameters and computational complexity.Then,in response to the problem of large structural parameters and high number of floating-point operations in the Vision Transformer,this paper proposes a lightweight segmentation network based on the Swin Transformer(Swin-MVT).By adding an ECA attention module after deep separable convolution,important channel feature representation is enhanced.At the same time,the Swin transformer module is introduced,and a new lightweight segmentation network is constructed by utilizing its window partitioning and shift window design to reduce the number of parameters and allow information limited within the window to flow across windows.Through comparative experiments,it has been proven that the algorithm proposed in this paper not only improves segmentation accuracy,but also reduces the number of parameters and computational complexity of the model,resulting in better usability.Finally,in view of the problem that the amount of data in medical image processing tasks is too small and it is too difficult to obtain high-quality manually labeled data,this paper uses the "noise student training method" to combine with the Swin-MVT network model,and proposes a Semi-Swin-MVT network model,which generates pseudo tags for a large number of unlabeled images and trains them together with the tag data.Through comparative experiments,it is proved that the algorithm in this paper plays a very important role in reducing overfitting and improving training accuracy.It solves the problem of over dependence on label data in current medical image segmentation tasks,and can be better applied to actual colonoscopy polyp image segmentation tasks. |