| The glomerular filtration barrier(GFB)plays an important role in the filtration of human blood.The pathological changes of the three-layer GFB ultrastructures observed through transmission electron microscopy provide significant evidence to diagnose kidney diseases.Developing GFB automatic segmentation algorithm and realizing morphological quantitative analysis can improve the diagnostic efficiency of pathologists for kidney diseases.At present,relevant researches focus on the design of the new deep model structure to achieve accurate segmentation of GFB,but the limitation of model performance caused by annotation scarcity is ignored to a certain extent.Self-supervised representation learning is an effective method to overcome the scarcity of labeled data.Through constructing a key self-supervised pretext task,the model is pre-trained with a large amount of unlabeled data to learn valuable representations that are beneficial to the downstream GFB segmentation task.At present,various pretext tasks have been proposed,which can be roughly divided into three types:contrastive,generative,and predictive pretext tasks.But all of them do not consider the characteristics of glomerular transmission electron microscopy images,and their effectiveness in downstream GFB segmentation tasks has not been verified.Based on the self-supervised representation learning method,this study proposes a new contrastive pretext task—USRegCon according to the characteristics of glomerular transmission electron microscopy images.There are three innovative points of USRegCon:(1)Adaptive region division:various ultrastructures are divided into different image regions,which will be continuously optimized with the training of the model.This region division method is more suitable for glomerular TEM images with diverse content.(2)Two-level region representation extraction:the first-order grayscale region representations and the deep semantic region representations are extracted from each region to encourage the model to learn richer ultrastructural information.(3)Multiple contrast strategy construction:specific contrast strategies are tailored for different regional representations to facilitate the model to learn similar or different representations of ultrastructures in each region,which can improve the structure identification ability of the model.This study also proposes a new generative pretext task—GCLR considering that the GFB segmentation process includes include structure identification in the global field of view and fine delineation in the local field of view.There are three innovative points of GCLR:(1)Efficient integration of subtasks:GCLR integrates the global clustering subtasks and the local restoration subtasks,and can better improve the performance of the model without increasing the consumption of computing resources.(2)Global presentation learning:the global clustering subtask requires the model to generate clustering images in the global field of view.A global clustering loss with an adjustable field of view is used to train the model to learn global context representations that are beneficial to structure identification.(3)Local representation learning:the local restoration subtask requires the model to restore the local perturbation regions in the image.The block shuffling perturbation operation used in the LR subtask can maintain the global pixel distribution of the image to promote the model to learn local detail representations that are beneficial to fine segmentation.To verify the effectiveness,two groups of comparison experiments were conducted between USRegCon and other contrastive pretext tasks,as well as between GCLR and other generative pretext tasks.Both of USRegCon and GCLR achieved the best performance in their respective comparison experiment group.In addition,the predictive pretext tasks,multi-pretext tasks,and fully-supervised pre-training tasks are also added to the comparison experiments.In particular,GCLR achieved the best performance in terms of improving model performance,increasing annotation benefit,and reducing training time,and has the potential to replace the traditional fully supervised pretraining task based on three large-scale public labeled datasets MitoEM,COCO,and ImageNet. |