Font Size: a A A

Research Of Acceleration And Compression Technology On Cross-modal Retrieval Model

Posted on:2024-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ChenFull Text:PDF
GTID:2568307076992849Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile Internet,the amount of multi-modal data such as image,text,and video has significantly increased.This has led to a change in the data modality of interest to users,and a change in their retrieval needs,from single-modal retrieval to cross-modal retrieval.Cross-modal retrieval is dedicated to the interaction of information between two different modalities,i.e.retrieving a sample from one modality to retrieve a sample from another modality with similar semantics.In the context of the Big Data era,cross-modal retrieval has become one of the necessary tools for retrieving information,as it is more in line with the retrieval needs of users than single-modal retrieval.Currently,visual-language large models based on the Transformer architecture are the dominant approach due to their high accuracy,and models can be classified as single-stream and dual-stream according to their structure.Both single-stream and dual-stream structures require the use of complex and large models and large computational resources in order to learn some useful information from large-scale datasets,which is clearly beyond the reach of many resource-constrained devices.In order to resolve the paradox of models requiring large computational resources and a large number of parameters with limited resources,it is necessary to compress and accelerate the models while keeping the performance constant.Therefore,this paper conducts research on model acceleration and compression for the cross-modal retrieval model TERAN.Specifically,the main research elements are as follows:(1)A multi-module collaborative knowledge distillation algorithm is proposed to compress the model.The algorithm adopts a modular distillation approach,dividing the distillation process into three parts: distillation of the feature extraction module,distillation of the feature learning module,and distillation of the similarity calculation module.The experimental results show that the accuracy of the distilled student model is improved by about5% compared to the student model before distillation,and the number of model parameters is reduced by 50% with the accuracy basically comparable to that of the teacher model.(2)In order to further improve the inference speed of the model,a cross-modal retrieval model TSCMR based on two-stage retrieval is proposed.The model divides the image-text retrieval into two stages,image-text coarse-grained matching and image-text fine-grained matching,to optimize the retrieval process.In the coarse-grained matching phase,the model incorporates global features representing images and text,and selects retrieval targets with the top k scores in coarse-grained matching to enter the fine-grained matching phase,reducing the time and computational resources consumed in the fine-grained matching phase and achieving inference acceleration of the model.The experimental results show that the model achieves 3.1times inference acceleration with a performance comparable to that of TERAN.(3)For the cross-modal retrieval model based on two-stage retrieval,a multi-module collaborative knowledge distillation algorithm was redesigned,dividing the distillation of the similarity calculation module into two parts: the distillation of the image-text coarse-grained matching stage and the distillation of the image-text fine-grained matching stage,and compressing the model on the basis of achieving model inference acceleration.The experimental results show that the accuracy of the distilled student model is improved by about5% compared with that of the pre-distillation student model,and a 3.2 times inference acceleration and 50% reduction in the number of model parameters are achieved with an accuracy comparable to that of the teacher model.
Keywords/Search Tags:Cross-modal retrieval, Model acceleration, Model compression, Two-stage retrieval, Knowledge distillation
PDF Full Text Request
Related items