| Computed tomography(CT)is a widely used imaging method in clinical diagnostics that scans and reconstructs tomographic images of the internal structures of the body to provide accurate information for disease detection and assessment.However,the X-ray dose required for CT imaging increases the risk of radiation damage to the patient,and therefore low-dose CT imaging has received increasing attention.However,low-dose CT imaging causes lower image quality compared to conventional dose CT imaging,mainly in the form of noise enhancement and artefact generation.These image quality problems may impair the physician’s judgment and analysis of the image content,reducing the diagnostic value of CT in the clinical setting.Therefore,noise and artefacts in low-dose CT images must be effectively reduced to balance CT image quality and radiation safety,which is an important challenge to be addressed before realising the clinical application of low-dose CT.Iterative reconstruction algorithms using physical models with a priori knowledge have achieved some success in reducing artifacts and noise in traditional methods.However,due to hardware limitations and high computational costs,these algorithms are difficult to further improve the reconstruction efficiency in commercial CT scanners.Deep learning,with its excellent image quality,fast processing speed,and strong generalization ability,has gradually dominated the task of low-dose CT image denoising.In the past,most research methods were based on convolutional neural networks(CNNs)to suppress image noise and achieved good results.However,due to the limitation of receptive field size,CNNs cannot fully capture the global information of images,which directly affects the recovery of structural information in denoised images.In recent years,Transformer models have shown outstanding performance in computer vision tasks,and some researchers have applied them to low-dose CT image denoising tasks.Compared with CNNs,Transformer models can better capture global information and remote feature interactions,thus obtaining richer image information.Moreover,its selfattention mechanism has high visual interpretability.However,the Transformer model also has drawbacks.The high computational complexity of its self-attentive mechanism presents or creates a significant challenge for the clinical applications.At the same time,Transformer models are not as effective as CNNs in extracting local information.Motivated by these challenges,this paper explores the network design limitations in existing deep learning-based methods for low-dose CT image noise reduction.The main work in this paper is as follows.(1)This paper proposes a neural network model based on the self-encoder and the hierarchical structure,which combines the advantages of a convolutional neural network and Transformer architecture to enhance the feature extraction capability of the model.Leveraging or Exploiting the advantages of self-attention mechanisms in capturing longrange feature interactions,this paper uses convolutional operations to assist the network in fine-grained feature extraction and combines convolution and Transformer in a hierarchical manner based on their different abilities to process high and low-frequency information in images,achieving hierarchical fusion of local and global features and improving the network’s noise reduction capability.(2)This paper introduces a patch-based lightweight global feature extraction module that leverages cross-covariance attention to perform self-attention operations on channel features instead of global features.This approach can fully utilize the benefits of the self-attention module in learning global features and capturing intrinsic visual features,while substantially reducing the model overhead by shrinking the computation of the self-attention operation from((2)2 to (2 2,which is linear in the image resolution.The method not only effectively lowers computational complexity but also implicitly encodes non-local contextual information and enhances feature representation. |