Research On Medical Images Classification For Diagnosis And Retrieval-based Text Generation

Posted on:2024-03-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Y Huang

Full Text:PDF

GTID:1524307082482944

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

The intelligent diagnosis classification and retrieval-based text generation of medical images are extremely valuable in the “Internet + medical health” field.These technologies can alleviate healthcare professionals’ workload and enhance the efficiency of medical image analysis and diagnosis.However,there are areas where existing algorithms for medical imaging diagnosis and retrieval-based text generation can be improved.1)it is desired to improve image resolution and prevent the issue of a poor classification model caused by minor variations between images;2)it is expected to enhance the correlation between cross-modal data and reduce the problem of the heterogeneity gap between cross-modal data;3)it is predicted to optimize the retrieval text and generation model fusion technique and eliminate the problem of redundant information in the generation process.To address these issues,the main research contributions of the dissertation are as follows:(1)Transformer-based factorized encoder for 3D CT image classification.The Xray image is widely employed in existing diagnostic classifications;however,it lacks adequate semantic information to distinguish between images of the same disease at different stages.Furthermore,because of the enormous number of convolution operations,existing CT-based methods are computationally expensive,and there is a lack of long-range interaction between CT slices.To address these issues,a highresolution dataset of pneumoconiosis CT images is developed,and a transformer-based factorized encoder is proposed to investigate remote interactions between and within CT slices and solve the problem of model imperfection caused by minor differences between images.The accuracy of the method was 2.94% higher than that of the Covidnet method.(2)A unified perspective of multi-level cross-modal similarity for cross-modal retrieval.Existing algorithms for evaluating cross-modal similarity ignore the local relationship between cross-modal data,limiting the model’s performance.Additionally,while calculating label similarity,the classifier’s classification bias can impact retrieval accuracy.In this paper,A unified perspective of the multi-level cross-modal similarity method is proposed,which can measure multi-level cross-modal similarity in the same common feature space.The average normalized discounted cumulative gain(NDCG)for the multi-modal datasets Pascal Sentence,Wikipedia,and XMedia Net are improved by 3.6%,3.7%,and 6.5%,respectively,compared with the DSRAN method based on dual semantic relations.(3)A semi-supervised cross-modal memory bank for cross-modal retrieval.When measuring the correlation between unlabelled data,existing algorithms assume that unlabeled data are correlated with predefined k-nearest neighbors,which leads to false connections between two sets of unrelated unlabelled data,and that can reduce the accuracy of cross-modal retrieval.To provide accurate supervision information for unlabelled data,a semi-supervised cross-modal memory bank is proposed to revise pseudo-labels by both the feature representation of the paired cross-modal data and the class probability of the labelled data.With a supervision rate of 10%,the average MAP@50 of the proposed method on Wikipedia,NUS-WIDE,and MS-COCO increased by 2.6%,1.8%,and 4.9%,respectively,compared with the semi-supervised method SCLss.The experimental results demonstrate that the proposed method outperforms existing methods.(4)Retrieval-based adaptive fusion strategy for medical image report generation.When previous methods use X-ray images as the generative model’s input,the variation between images is low,resulting in a high resemblance of the generated text.Furthermore,the flaw of the fusion strategy results in a considerable amount of redundant information in the created text,lowering the quality of the generated text.To solve the above problems,we collect CT image-text data from 8 lung diseases and propose a retrieval-based adaptive fusion strategy.The strategy adds the retrieval probability with a weight to the generation probability to realize the dynamic fusion process.Compared to the fusion method without weight,the consensus-based Image Description Evaluation(CIDEr)scores of the proposed method improved by 15.9%.The experimental results show that generated text by the proposed method is more similar to human-generated text.

Keywords/Search Tags:

Medical image analysis and diagnosis, classification, text generation, cross-modal retrieval

PDF Full Text Request

Related items

1	Cross-modal Medical Image Retrieval
2	Cross-modal Image Generation Algorithm From CT To MRI For Acute Ischemic Stroke
3	Medical Images And Diagnostic Reports For Cross-Modal Retrieval
4	Research On Medical Report Generation Based On Transformer
5	The Research On Key Technology Of Text Retrieval Of Chinese Electronic Medical Record
6	The Research On Content-based Medical Image Retrieval Algorithm
7	Multimodal Fusion And Disease Auxiliary Diagnosis Classification Method For Brain Medical Image
8	Cross-modal Retrieval Study For Chest X-ray Images And Diagnostic Reports
9	The Research Of Automatic Feature Extraction For Medical Image Content-Based Retrieval
10	Research On Key Techniques For Medical Diagnosis Report Generation Driven By Multi-Modal Data And Knowledge