| Retinal vein occlusion is one of the more prevalent retinal vascular diseases.Currently it is the second most common eye disease causing vision loss in the middle-aged and elderly population,following only diabetic retinopathy.Moreover,retinal vein occlusion has a high probability of causing comorbidity such as macular edema in late stages.Treatment is very limited and it is mainly focused on preventing or improving comorbidities.In clinical practice,ophthalmologists treat patients based on their retinal images.However,the high ratio of patients to doctors makes manual diagnosis less efficient.Therefore,there is an urgent need to develop an automated technology to assist in the medical detection of diseases.With the rapid development of artificial intelligence technology,there are more and more computer aided diagnosis methods involving medical images.In particular,deep learning techniques represented by convolutional neural networks have the widest range of applications.Consequently,this thesis focuses on the automatic diagnosis algorithm of retinal images.Additionally,the pathological multicolor image of retinal vein occlusion is the subject.Meanwhile,deep learning techniques and Transformer structures were used to explore the research.The main work and innovations of this thesis are as follows.Firstly,a multicolor image classification method based on multi-scale attention network is proposed for the problem that a part of the MC images do not have a high contrast between target and background.It simultaneously extracts the pathological features of the four images corresponding to each eye.The multi-scale attention module enhances the association between target pixels and weakens the dependency between different classes of pixels.The module establishes the dependencies of image channel features and spatial locations by channel and spatial attention mechanisms.This approach improves the problem of insensitivity of convolution operations to the global position of features by reinforcing the potential consistency and wholeness of the image data.Secondly,a joint framework called Res TR is proposed for the problem that medical images are easy to lose explicit sequence information.Res TR combines the Transformer mechanism with convolutional neural networks.The framework takes advantage of the Transformer’s ability to capture global long-range sequence dependencies to extract semantic features more efficiently.At the same time,convolutional neural networks are good at extracting high-valued abstract structural features in images.Res TR combines the advantages of both to maximize the use of lesion information in the images to classify the presence or absence of retinal vein occlusion.In particular,a hybrid loss function is proposed to show how far the prediction differs from the actual data and it is beneficial to further optimize the model performance. |