| In recent years,affected by the epidemic situation and many epidemics,the medical image data of patients has been increasing yearly.Traditional image analysis methods are timeconsuming and labor-intensive,resulting in a steep increase in the workload of image surgeons and human error.At present,deep learning has made great breakthroughs in medical image processing.Computer vision technology can be used to achieve automatic or semi-automatic diagnosis of diseases,alleviating the enormous challenges caused by the growing medical image data.However,deep learning’s end-to-end working model is uninterpretable,hindering its further development in the field of intelligent medicine.Recently,researchers have integrated natural language processing technology into the process of medical imaging report generation,combined with patients’ clinical diagnostic information,to establish a multimodal medical imaging diagnostic report generation model,and automatically generate patient diagnostic reports.Medical imaging diagnostic report automated generation algorithms can simulate physicians in generating diagnostic reports,give understandable evidence for disease diagnosis,and enhance deep learning interpretability of medical imaging data processing.Nevertheless,existing algorithms still meet the following challenges.(1)Medical images are highly similar,i.e.,visual differences in image features are usually fine-grained.The existing medical image diagnostic report generation model is challenging to extract the fine-grained features used to guide the generation of reports,resulting in the generated diagnostic reports being less matched than those provided by radiologists.In addition,affected by individual differences,the number,volume,and location of lesions in medical images are uncertain,which leads to a high misdiagnosis rate in diagnostic reports generated by existing algorithms.(2)Diagnostic reports usually face the long tail problem.The occurrence frequency of crucial or medical-specific words involving lesions is low,while the frequency of non-crucial words is high.However,the existing medical image diagnostic report generation model usually assigns the same weight to all words,resulting in low accuracy of the generated diagnostic report.(3)Insufficient clinical interpretability of automatically generated diagnostic reports.The diagnostic report consists of summary statements and description statements.Affected by the characteristics of long text,unnecessary repeated sentences or repeated paragraphs are easy to appear in the generation of the diagnostic report,resulting in poor quality and difficulty in understanding and unable to assist radiologists in making accurate clinical judgments.Given the above problems,this thesis focuses on the critical technologies for automatically generating medical image diagnostic reports.It puts forward a series of automated generation algorithms for medical image reports.The main contributions are as follows:(1)Proposing a high-resolution representation learning model,HRRANet,based on the residual attention mechanism to mine the fine-grained features in medical images.Firstly,the model used the information interaction between the multi-resolution branches to extract the multi-scale features in medical images.Secondly,multiple residual attention modules were integrated based on the original model to extract the global features around the lesions in medical images.Thirdly,multi-scale features and global features are transferred by skip connections.Finally,the fine-grained features acquired from the model are put into the decoder to guide the generation of medical image diagnosis reports.(2)Proposing an unsupervised lightweight term weighting algorithm,HRRA-TWNet,based on the HRRANet fine-grained feature extraction model to ensure that the model gives different weights to the words in the diagnosis report.Firstly,the segtok segmentation was used to preprocess the words in the diagnostic report.Secondly,features are extracted based on term case,term frequency,context relevance,and frequency of term occurrence in different statements.Finally,the term weight was added to the report generation cross entropy loss function to calculate the loss generated by the term weight diagnosis report.(3)A repetition penalty mechanism was added based on the HRRA-TWNet model to reduce unnecessarily repeated statements and repeated paragraphs.To improve the clinical interpretability and rationality of diagnostic reports,first,the frequency of words in the diagnosis report was calculated.Thirdly,the words with high repetition frequency were given an exponential punishment probability.Finally,the exponential penalty probability was integrated into the cross entropy loss function to constrain non-essential repetitive words in the diagnosis report.In this thesis,ablation and comparison experiments were performed on the IU-Xray dataset and the MIMIC-CXR dataset.Bilingual Evaluation Understudy(BLEU),Recall Oriented Understudy for Gisting Evaluation(ROUGE),and Consensus-based Image Description Evaluation(CIDEr)were used to evaluate the model performance.The experimental results show that the model proposed in this thesis can accurately generate diagnostic reports based on medical images.The accuracy is better than the existing model,which improves clinical interpretability of deep learning disease diagnoses. |