| Wireless capsule endoscopy(WCE),a safe,convenient,and comfortable new digestive health screening tool that records images through a miniature camera and transmits them wirelessly to a display screen,where a physician can make a diagnosis and issue a diagnostic report.However,passive WCE can result in blind spots in the stomach due to the large space in the human stomach.Additionally,low-quality and redundant images in the WCE video can increase the burden on doctors and lead to increased inspection costs.This thesis proposes a temporal multi-label classification network for anatomical location classification,a fine-grained multi-lesion detection network for detecting multiple gastrointestinal diseases,and a keyframe extraction approach called Frame Importance-Assisted Sparse Subset Selection(FIAS3)to aid in quick browsing of the entire video.The research work of this thesis are as follows:(1)To address the issue of incomplete coverage of the stomach during passive WCE inspections,this thesis proposes a multi-label classification network for gastric anatomic location based on temporal transformer.Additionally,unsupervised pretraining is employed to conduct representation learning on a large-scale,unlabeled WCE video dataset.The classification performance is further improved by fine-tuning on a labeled gastric anatomic location dataset.After fine-tuning,the classification network achieved micro-average,macro-average,and example-wise F1 scores of 0.827,0.820,and 0.859,respectively,which are significantly higher than the baseline model.These results demonstrate that the proposed classification network is effective in improving the classification performance of WCE gastric anatomic location.(2)This thesis proposes a fine-grained multi-lesions automatic detection network to address the issue of low detection rates of small and rare lesions.To improve detection performance for small and rare lesions,the network employs a multi-scale transformer backbone and an anchor-free detection head to extract multi-scale global information and increase the proportion of positive samples,respectively.Additionally,unsupervised pre-training is applied to further enhance detection performance.The mAP50 of the 11 lesion detection categories in the dataset is 0.600,which is 0.178 higher than the existing method.These results demonstrate that the method proposed in this chapter has significant advantages for the task of fine-grained multi-lesion detection.(3)To address the heavy workload for doctors when reading WCE images,this thesis proposes a Frame Importance-Assisted Sparse Subset Selection(FIAS3)approach.FIAS3 is subject to three constraints:sparsity,similarity-inhibiting,and frame importance.The results obtained from FIAS3 can used to generate short video summaries,which can help doctors quickly browse WCE videos.To verify the effectiveness of the method proposed in this chapter,the coverage of lesions and anatomical landmarks(Coverage)and the video reconstruction error(VRE)were evaluated on public and private datasets.Specifically,at a compression rate of 90%,the coverage and VRE are 92%and 0.143,respectively,which are at least 16.9%and 0.031 better than existing methods.These results demonstrate that,under the same compression rate,the keyframe extracted by FIAS3 has higher quality than that of existing methods. |