Research And Application Of Cross Modal Image And Text Retrieval Based On Deep Learning

Posted on:2024-07-04

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Chen

Full Text:PDF

GTID:2568307079972159

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

The single mode information retrieval mode is gradually difficult to meet the growing demand for information processing,and the multimodal retrieval based on deep learning as a new information retrieval scheme has gradually attracted more attention.Cross modal retrieval refers to retrieving information with similar semantics from one type of information.Unlike single mode retrieval,the performance of data from different modes varies greatly,and there is also a large amount of noise in the data information,making cross mode retrieval more difficult.Research on cross modal image and text retrieval has made some progress,but the efficiency of retrieval still needs to be improved.Currently,there are still two challenges in the field of image and text retrieval.One is that many studies have ignored the retrieval between similar images and text in image and text retrieval.Second,there may be inaccurate text queries in practical applications of cross modal retrieval.The main research content and innovation points of this thesis are as follows:(1)A distributed embedding based cross modal retrieval model(DERM)is proposed to solve the problem of one to many and many to one images and text in cross modal image and text retrieval.This model extracts the features of images and text through a deep network model,inputs the obtained basic features into the designed embedded network,combines global and local features,and adds residual learning to obtain the representation of images and text in the same space.Using the Gaussian distribution embedded representation in high-dimensional space instead of the embedded representation of a point in space can improve the retrieval ability of the model between similar images and texts.(2)Aiming at the problem of inaccurate text queries that may occur in cross modal retrieval,an iterative query cross modal retrieval model(IQRM)is proposed.The model is generally divided into four modules: image feature extraction,text feature extraction,matching ranking,and enhanced query.This model extracts the features of images and text through a deep learning model,and performs matching retrieval between image and text features through image text stack cross attention algorithm.In the reinforcement query module,the most differentiated target object categories in the retrieval results are obtained through deep reinforcement learning for user confirmation,thereby increasing text richness and improving retrieval performance.(3)A prototype of a cross modal image and text retrieval system based on deep learning is designed and implemented.The main functions of the system include user registration,login,text search,image search,and display of search results,meeting the user’s needs for cross modal image and text retrieval.

Keywords/Search Tags:

Deep learning, cross modal learning, image and text retrieval, feature embedding, deep reinforcement learning

PDF Full Text Request

Related items

1	Research On The Method Of Cross-modal Image And Text Retrieval Based On Deep Learning
2	Deep Learning Based Video-Text Cross-Modal Retrieval
3	Deep Network For Image-Text Cross-Modal Retrieval
4	Research On Cross-Modal Retrieval Of Image And Text Based On Deep Learning
5	Research On Novel Retrieval Techniques For Fashion Media Data
6	Cross-Modal Retrieval Of Image-Text Based On Deep Learning
7	Research On Algorithm Of Deep Convolution Network And Feature Fusion For Cross Modal Commodity Retrieval
8	Research On Cross-Modal Retrieval Based On Deep Semantic Analysis
9	Research On Image And Text Retrieval Based On Attention Mechanism
10	Research On Text-Image Cross Modal Retrieval Method