Font Size: a A A

Design And Implementation Of Cross-modal Retrieval For Images And Texts Based On Deep Learning And Hashing Methods

Posted on:2024-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z L LuoFull Text:PDF
GTID:2568306944958099Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the Internet era,the explosion of graphic and text data on the network has created an urgent need for efficient and accurate retrieval of the information people need from massive data.Currently,there are several problems and challenges in the field of graphic and text retrieval,such as high information redundancy or loss in text-based single-modal retrieval,high resource consumption and low retrieval efficiency in imagebased single-modal retrieval,and the inability to achieve sufficient"semantic alignment" and low retrieval efficiency in cross-modal retrieval algorithms.To address these issues and challenges,this paper primarily investigates the following topics:(1)Proposes a single-modal retrieval algorithm based on deep learning and hash methods,as well as a multi-modal data processing algorithm.For text data,an algorithm based on BERT and hash encoder is proposed to achieve large-scale and efficient text semantic retrieval.For image data,a mean hash algorithm based on grayscale value comparison is proposed,and combined with ElasticSearch technology to achieve largescale and efficient image accurate retrieval.In addition,this paper also proposes a multi-modal data processing solution for audio and video data to realize audio and video retrieval.(2)Proposes a cross-modal retrieval algorithm based on pre-trained models and encoders.A pre-trained model based on multi-path Transformer is proposed,which allows different modal data to fully interact and share information,thus achieving high-quality "semantic alignment".Based on this pre-trained model,a dual encoder and fusion encoder are constructed,with the dual encoder realizing "rough recall" and the fusion encoder realizing "accurate sorting",ultimately achieving efficient and accurate cross-modal retrieval.(3)Designs and implements a graphic and text cross-modal retrieval system based on deep learning and hash methods.This paper conducts layered design of the system,organically integrates the above single-modal retrieval algorithm and cross-modal retrieval algorithm,and makes reasonable use of middleware such as RocketMQ,Redis,and Nginx,ultimately realizing a high-efficiency,high-precision,and high-availability graphic and text cross-modal retrieval system.Finally,this system was applied in the national key R&D program"Research and Application Demonstration of the Winter Olympics Global Communication Platform".During the Winter Olympics,the system provided high-quality retrieval services to a large number of users and received high recognition from the Ministry of Science and Technology and the Winter Olympics Organizing Committee.
Keywords/Search Tags:deep learning, hash coding algorithm, graphic and textual cross-modal retrieval
PDF Full Text Request
Related items