Font Size: a A A

Research On The Method Of Laryngoscope Imaging Report Generation Based On Image Caption

Posted on:2021-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:G D LiFull Text:PDF
GTID:2504306050470804Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The medical image reports generation aims to automatically generate a natural language describing the content for a medical image.This method shows great application value in medical image understanding and computer-aided diagnosis.Different from coarse-grained medical image understanding tasks such as the classification and annotation of medical images,the generation of medical imaging reports needs to convert the images into continuous text,which involves the extraction of image features and the generation of text.The premise of this task is to extract high-quality image features,and then we need to generate a coherent medical report that meets grammatical constraints based on the features.Obviously,this task involves three fields: computer vision,natural language processing and medicine.Therefore,research on the generation of medical imaging reports is not only conducive to the development of image and text processing,but also to the development of intelligent medical.This paper investigates the development of the medical image report generation at home and abroad in recent years and finds that most of the current methods use Convolutional Neural Network(CNN)as an encoder to extract medical image features,and use Recurrent Neural Network(Recurrent Neural Network)(RNN)and its variant Long Short Term Memory(LSTM)and Gated Recurrent Unit(GRU)as decoders to generate medical reports.Although this end-to-end encoder-decoder model has made some achievements,there are still several problems need to be solved.Specifically,the text does not impose sufficient constraints on the process of extracting image features,image features are not fully utilized,the modeling of the attention mechanism and the fusion of image and text is not complete.This paper focuses on these three problems,and the specific content is as follows:This paper proposes a laryngoscope report generation model with multi-channel image-text constraints.This model enhances the text’s constraint on the process of image feature extraction by adding two additional channels to the basic CNN-GRU structure.The model contains three channels.The first channel is an encoder-decoder structure of image to text.The second channel is an encoder-decoder structure of forward text to forward text.The third channel is an encoder-decoder structure of reverse text to forward text.Text-to-text modeling is implemented through the last two channels,and more textual information is introduced into the model to strengthen the text’s constraints on the process of image feature extraction.The results on the laryngoscope image dataset show that our model is superior to other contrast models,and the visualization results also show that the image features extracted by this model are more similar to text feature than other contrast models.This paper proposes a laryngoscope report generation model with multi-feature fusion guided decoding.First we use CNN to extract the convolutional features and fully connected features of the image,and then we use GRU to learn the connection between image and text.In the part of text generation,the output of the GRU,fully connected features and convolutional features are used to generate text.The results on the laryngoscopy image dataset show that our model is superior to other contrast models,and the visualization results also show that the fusion features in this model have the function of feature complementation in text generation.This paper proposes a laryngoscope report generation model with dual GRU.This model uses the basic CNN-GRU structure,but in the decoding part,Attention GRU and Language GRU are used to model the attention mechanism and image-text fusion respectively,thereby avoiding a single LSTM to model these two tasks at the same time.The results on the laryngoscope image dataset show that our model is superior to other contrast models,and the visualization results also show that the attention mechanism and image-text fusion in this model are significantly improved over the baseline.
Keywords/Search Tags:medical report generation, deep learning, convolutional neural network, recurrent neural network
PDF Full Text Request
Related items