Font Size: a A A

Knowledge Distillation Based Span Extraction For Machine Reading Comprehension

Posted on:2024-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:J J CaoFull Text:PDF
GTID:2568307061485774Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of technology,outlook on the machine has moved from science fiction to real life.Artificial intelligence is quietly changing human life.Technological innovation is driving the vigorous development of society.By the end of2022,the groundbreaking AIGC product Chat GPT had emerged from the laboratory,allowing more people to experience the impact of artificial intelligence on society and the promising new revolution of the Internet.However,the advent of large language models has also brought about technological barriers.How to efficiently apply large language models in the case of scarce hardware resources has become a hot social spot,and knowledge distillation,one of model compression methods,has become one of common methods to solve this problem.As a basic task in natural language processing,span extraction machine reading comprehension has a wide range of applications in social life,such as search engines.The main task of span extraction machine reading comprehension is to infer the location information of the answer to a question in the text based on the existing article through the model.In real life,large language models have become the best choice for completing span extraction machine reading comprehension tasks.Using knowledge distillation to transfer the knowledge of large language models to smaller models can reduce computational cost and complete span extraction machine reading comprehension task efficiently at the same time.Traditional knowledge distillationbased span extraction methods pay attention to optimize and change student model and teacher model without considering the fact that when the performance of the student model and the teacher model differs greatly,the student model cannot effectively learn from the teacher model.Considering this fact,this paper mainly presents two findings that narrowing the gap between student and teacher model through introducing teacher assistant model and multi-student model to traditional knowledge distillation model:(1)This paper proposes a knowledge distillation method based on teacher assistant model to narrow the gap between student and teacher model.This method introduces teacher assistant model to bridge the gap between the teacher model and the student model.Proposed teacher assistant model is a deep learning model that performs between the student model and the teacher model in terms of the amount of parameters and training time in span extractive machine reading comprehension tasks.The teacher assistant model can learn from the teacher model and instruct the student model to improve the effectiveness of "learning".According to the experiments,through the assistant model,the student model can better learn the text representation of the teacher model.(2)This paper proposes a knowledge distillation method based on multi-student model to tackle the problem that single student model cannot effectively learn from teacher model.This method introduces multi-student models to improve the ability of overall student models on locating the text segment containing the answer of the question and narrow the gap in model performance between single student and teacher model,thus the performance gap between single student models and teacher models is bridged through the joint learning of multi-students.According to the experiments,compared with traditional knowledge distillation models,multi-student model can achieve better knowledge distillation by learning different "knowledge" from the teacher model.
Keywords/Search Tags:machine learning comprehension, knowledge distillation, natural language processing, deep learning
PDF Full Text Request
Related items