| Text-matching algorithm is one of the key points of natural language processing,supporting intelligent systems such as search engine systems,dialogue robots,and voice assistants.There are many studies on monolingual text-matching algorithms,but in multilingual scenarios,the training and deployment of the algorithms face new challenges such as insufficient resources.Therefore,this thesis selects multilingual text-matching algorithms in low-resource scenarios for research.In multilingual scenarios,the resource constraints mainly come from two aspects.The first is that multilingual models are huge,making it difficult to be deployed to memory-limited devices.But in semantic learning,simply shrinking the model will cause significant performance loss.Therefore,the first part of this thesis focuses on building a small-size but high-performance cross-lingual semantic model.The second is that multilingual data increases the training cost of the model,and it is difficult to obtain high-quality annotated data in lowresource languages.Existing studies use annotated data from high-resource languages to finetune multilingual models and improve cross-lingual transfer learning based on bilingual dictionaries or perturbations to alleviate this problem.However,most of these methods still rely on expensive bilingual dictionary resources and lack in-depth discussion on cross-lingual transfer.Therefore,the second part of this thesis will explore cross-lingual transfer learning and improve model performance.The research on the above two issues is as follows:(1)This thesis proposes a knowledge distillation method,using an assistant model and a multi-stage distillation framework to shrink the model and learn semantic knowledge simultaneously.In our framework,bottleneck,parameter recurrent,and contrastive learning strategies are combined to prevent performance from being compromised during the compression process.(2)This paper analyzes the limitations of cross-lingual transfer and proposes a new method.This method uses data augmentation in English only and combines three new objective functions to improve the performance of cross-lingual transfer in the multilingual text-matching model.Experimental results show that our method can improve model performance in multiple languages,and only use English-labeled data to align multiple languages in the semantic space of the model,reducing training costs. |