| The single mode information retrieval mode is gradually difficult to meet the growing demand for information processing,and the multimodal retrieval based on deep learning as a new information retrieval scheme has gradually attracted more attention.Cross modal retrieval refers to retrieving information with similar semantics from one type of information.Unlike single mode retrieval,the performance of data from different modes varies greatly,and there is also a large amount of noise in the data information,making cross mode retrieval more difficult.Research on cross modal image and text retrieval has made some progress,but the efficiency of retrieval still needs to be improved.Currently,there are still two challenges in the field of image and text retrieval.One is that many studies have ignored the retrieval between similar images and text in image and text retrieval.Second,there may be inaccurate text queries in practical applications of cross modal retrieval.The main research content and innovation points of this thesis are as follows:(1)A distributed embedding based cross modal retrieval model(DERM)is proposed to solve the problem of one to many and many to one images and text in cross modal image and text retrieval.This model extracts the features of images and text through a deep network model,inputs the obtained basic features into the designed embedded network,combines global and local features,and adds residual learning to obtain the representation of images and text in the same space.Using the Gaussian distribution embedded representation in high-dimensional space instead of the embedded representation of a point in space can improve the retrieval ability of the model between similar images and texts.(2)Aiming at the problem of inaccurate text queries that may occur in cross modal retrieval,an iterative query cross modal retrieval model(IQRM)is proposed.The model is generally divided into four modules: image feature extraction,text feature extraction,matching ranking,and enhanced query.This model extracts the features of images and text through a deep learning model,and performs matching retrieval between image and text features through image text stack cross attention algorithm.In the reinforcement query module,the most differentiated target object categories in the retrieval results are obtained through deep reinforcement learning for user confirmation,thereby increasing text richness and improving retrieval performance.(3)A prototype of a cross modal image and text retrieval system based on deep learning is designed and implemented.The main functions of the system include user registration,login,text search,image search,and display of search results,meeting the user’s needs for cross modal image and text retrieval. |