| A medical report is a textual description of the medical information presented in medical images(such as PET/CT scan images or radiology images),including detailed descriptions of the shape of different organs,the abnormalities of lesions,and clinical diagnosis made by radiologists based on medical imaging information.Therefore,medical reports play an important role in the diagnosis of disease and the treatment of patients.The rapid development of Deep Learning(DL)in research fields such as Computer Vision(CV)and Natural Language Processing(NLP)has made smart healthcare an increasingly important area of application research,and medical report generation has become an urgent and attractive research direction in the field of smart healthcare.The main purpose of the medical report generation task is to automatically generate a medical report corresponding to a given medical image through Artificial Intelligence(AI)methods.This task not only greatly reduces the workload of radiologists,but also helps patients receive treatment in time.However,there are still many difficulties in the current research on this task,in order to solve these problems,through research and analysis of existing methods,this paper proposes a novel method for medical report generation task.The main research work of this paper can be summarized as follows:(1)Due to the huge difference between image semantic features of medical images and text semantic features of medical reports,the professionalism and accuracy of the generated medical reports cannot be guaranteed.In addition,for many existing methods,there are a large number of similar pathological features in different medical images that have not been effectively and sufficiently utilized in the feature extraction stage,resulting in the poor ability of the model to capture and learn the medical information.Considering questions mentioned above,this paper proposes a Visual Memory and External Knowledge Based Network for Medical Report Generation(VMEKNet)that integrates visual memory and external knowledge into the task.Specifically,this paper proposes two modules and introduces them into EncoderDecoder framework which is widely used for medical report generation task.Among them,the TF-IDF Embedding(TIE)module integrates external knowledge(the TF-IDF semantic features of text)into the feature extraction stage through the TF-IDF algorithm,and the Visual Memory(VIM)module makes full use of previous image features to help the model extract more accurate medical image features.After that,a standard Transformer model is used to process image features and text features to generate a complete medical report.(2)In commonly used datasets for medical report generation task,there is a significant difference in the number between normal and abnormal samples,which hinders the model to fully learn abnormal samples and generate medical reports for abnormal images.This is also an important factor hindering the research of medical report generation task.Regarding the problem mentioned above,this paper proposes an optimized method for medical report generation task combined with Contrastive Learning(CL).First,this paper proposes a selfsupervised visual feature extraction module combined with contrastive learning.Then,in order to further improve the performance of the VMEKNet proposed in third Chapter,the module and the VMEKNet model are combined to form the optimized medical report generation model Contrastive Learning based VMEKNet(CL-VMEKNet).In order to verify the effectiveness of the proposed method,this paper designs and completes quantitative,qualitative and ablation experiments on the medical report generation dataset IUX-Ray,and then analyzes the experimental results.Experimental results have demonstrated that the method proposed in this paper outperforms previous works on both natural language generation metrics and practical clinical diagnosis.At the same time,this method shows superior performance in medical report generation task,which has clinical applications value. |