Font Size: a A A

Research On Compression Method Of Multimodal Pretrained Model Based On Knowledge Distillation

Posted on:2023-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:D W LiaoFull Text:PDF
GTID:2558306617982119Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The multimodal pre-training model shows strong performance on tasks such as visual language classification and visual language generation,and can realize applications such as visual question answering,visual navigation,and visual reasoning.However,at present,the model inference speed is slow due to too many parameters,making it impossible to perform operations in devices with limited computing resources.Existing research mostly uses model compression techniques such as early dropout,pruning and knowledge distillation to solve this problem,among which knowledge distillation is the best.However,due to the particularity of multimodal tasks,the current multimodal knowledge distillation still has problems such as distillation overfitting and poor task adaptation.Therefore,this thesis studies modality-specific layer-wised distillation and its adaptive variant based on knowledge distillation.The main contents include:(1)Although the existing layer-wised distillation methods can alleviate the problem of overfitting in distillation,there will be significant differences in the single-modal output between the student model and the teacher model.Taking only multimodal samples as input can lead to overfitting of the student model and inefficient distillation.To this end,this thesis proposes a modality-specific interlayer distillation method to better transfer knowledge from the teacher model to the student model by learning how the teacher model performs in each modality.(2)It is not clear whether the mean square error loss function is the best calculation method and whether it can fully make the learning efficiency of the student model more efficient.To this end,this thesis carried on experiments with four methods of calculating the distribution difference between layers to verify the effectiveness of the mean square error.In the inter-layer mapping method of inter-layer distillation,the original skip-step mapping method and the last-step mapping method cannot fully learn the inter-layer knowledge of the teacher model and have poor task adaptability.To this end,this thesis proposes a many-to-many inter-modality-specific knowledge distillation based on Wasserstein distance.This method fully considers the influence of all intermediate layers of teachers on all intermediate layers of students,and adaptively generates different layers mapping relationship for different tasks(3)In the mobile computing environment,the visual language model deployment model has problems such as no image feature extraction,and inability of inference without a network.To this end,this thesis designs a visual language model deployment framework to solve these problems uniformly.
Keywords/Search Tags:Pretrained visual language model, Knowledge distillation, Interlayer distillation, Many-to-many interlayer distillation
PDF Full Text Request
Related items