Research On Multilingual Text-Matching Algorithms In Low Resource Scenarios

Posted on:2024-04-14

Degree:Master

Type:Thesis

Country:China

Candidate:K B Ding

Full Text:PDF

GTID:2568307070499104

Subject:Software engineering

Abstract/Summary:

Text-matching algorithm is one of the key points of natural language processing,supporting intelligent systems such as search engine systems,dialogue robots,and voice assistants.There are many studies on monolingual text-matching algorithms,but in multilingual scenarios,the training and deployment of the algorithms face new challenges such as insufficient resources.Therefore,this thesis selects multilingual text-matching algorithms in low-resource scenarios for research.In multilingual scenarios,the resource constraints mainly come from two aspects.The first is that multilingual models are huge,making it difficult to be deployed to memory-limited devices.But in semantic learning,simply shrinking the model will cause significant performance loss.Therefore,the first part of this thesis focuses on building a small-size but high-performance cross-lingual semantic model.The second is that multilingual data increases the training cost of the model,and it is difficult to obtain high-quality annotated data in lowresource languages.Existing studies use annotated data from high-resource languages to finetune multilingual models and improve cross-lingual transfer learning based on bilingual dictionaries or perturbations to alleviate this problem.However,most of these methods still rely on expensive bilingual dictionary resources and lack in-depth discussion on cross-lingual transfer.Therefore,the second part of this thesis will explore cross-lingual transfer learning and improve model performance.The research on the above two issues is as follows:(1)This thesis proposes a knowledge distillation method,using an assistant model and a multi-stage distillation framework to shrink the model and learn semantic knowledge simultaneously.In our framework,bottleneck,parameter recurrent,and contrastive learning strategies are combined to prevent performance from being compromised during the compression process.(2)This paper analyzes the limitations of cross-lingual transfer and proposes a new method.This method uses data augmentation in English only and combines three new objective functions to improve the performance of cross-lingual transfer in the multilingual text-matching model.Experimental results show that our method can improve model performance in multiple languages,and only use English-labeled data to align multiple languages in the semantic space of the model,reducing training costs.

Keywords/Search Tags:

Multilingual, Text-matching, Low resource, Knowledge distillation, Crosslingual transfer learning, Contrastive learning

Related items

1	Research And System Implementation Of Text Matching Algorithm Based On Contrastive Learning
2	Research On Semantic Similarity Of Text Based On Unsupervised Contrastive Learning
3	A Study On Multilingual Representation Learning And Application Based On Pre-Trained Language Model
4	A Study On Low-resource Multilingual Speech Recognition Based On Transfer Learning
5	Telecom Complaint Text Classification Based On Adversarial Training And Contrastive Learning
6	Research On Yes/no Question Answering Based On Transfer Learning
7	Research On Joint Information Extraction Methods In Low Resource Situations
8	Federated Learning Based On Knowledge Distillation
9	Video Text Retrieval Based On Dynamic Distillation Learning
10	Research And Implementation Of OCR Algorithm Based On Text Knowledge Transfer