Research On Compression Method Of Multimodal Pretrained Model Based On Knowledge Distillation

Posted on:2023-11-16

Degree:Master

Type:Thesis

Country:China

Candidate:D W Liao

Full Text:PDF

GTID:2558306617982119

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The multimodal pre-training model shows strong performance on tasks such as visual language classification and visual language generation,and can realize applications such as visual question answering,visual navigation,and visual reasoning.However,at present,the model inference speed is slow due to too many parameters,making it impossible to perform operations in devices with limited computing resources.Existing research mostly uses model compression techniques such as early dropout,pruning and knowledge distillation to solve this problem,among which knowledge distillation is the best.However,due to the particularity of multimodal tasks,the current multimodal knowledge distillation still has problems such as distillation overfitting and poor task adaptation.Therefore,this thesis studies modality-specific layer-wised distillation and its adaptive variant based on knowledge distillation.The main contents include:(1)Although the existing layer-wised distillation methods can alleviate the problem of overfitting in distillation,there will be significant differences in the single-modal output between the student model and the teacher model.Taking only multimodal samples as input can lead to overfitting of the student model and inefficient distillation.To this end,this thesis proposes a modality-specific interlayer distillation method to better transfer knowledge from the teacher model to the student model by learning how the teacher model performs in each modality.(2)It is not clear whether the mean square error loss function is the best calculation method and whether it can fully make the learning efficiency of the student model more efficient.To this end,this thesis carried on experiments with four methods of calculating the distribution difference between layers to verify the effectiveness of the mean square error.In the inter-layer mapping method of inter-layer distillation,the original skip-step mapping method and the last-step mapping method cannot fully learn the inter-layer knowledge of the teacher model and have poor task adaptability.To this end,this thesis proposes a many-to-many inter-modality-specific knowledge distillation based on Wasserstein distance.This method fully considers the influence of all intermediate layers of teachers on all intermediate layers of students,and adaptively generates different layers mapping relationship for different tasks(3)In the mobile computing environment,the visual language model deployment model has problems such as no image feature extraction,and inability of inference without a network.To this end,this thesis designs a visual language model deployment framework to solve these problems uniformly.

Keywords/Search Tags:

Pretrained visual language model, Knowledge distillation, Interlayer distillation, Many-to-many interlayer distillation

PDF Full Text Request

Related items

1	Research On Efficient Knowledge Distillation Methods
2	Research On BERT’s Knowledge Distillation And Sparsity Fine-tuning Model
3	Research On Stage-by-Stage Knowledge Distillation And Assistant Model Based Knowledge Distillation
4	Research On Lightweight Traffic Classification Based On Knowledge Distillation
5	Ontology-Based Dialogue State Tracking And Its Knowledge Distillation Method
6	A Knowledge-Enhanced Pretrained Language Model In Closed Domains
7	Research And Implementation Of Model Compression Method Based On Knowledge Distillation
8	Research On Model Distillation Via Diversity Knowledge Transfer
9	Study On Robust Distillation And Pruning Methods For Defending Against Adversarial Examples
10	Research On Knowledge Distillation Algorithm Based On Multiple Homogeneous Teacher Networks