Question Answer Matching is an important research field in Natural Language Understanding and has wide practical applications,such as information retrieval,intelligent question answering,and dialog systems.The task of Question Answer Matching can be divided into two sub-tasks:answer selection and question paraphrase identification.The task of answer selection requires the model to evaluate the semantic relevance between a given question and its candidate answers,i.e.,in the "question-answer" scenario.On the other hand,question paraphrase identification requires the model to judge the semantic consistency between two given questions,i.e.,in the "question-question" scenario.Pre-trained language models have brought breakthroughs to the task of question answer matching,but they also face many challenges.Existing matching methods are based on pre-trained language models and have made remarkable progress.However,current state-of-the-art methods still face two limitations:weak key semantic extraction ability and poor query intention perception ability in the task of Question Answer Matching.In this paper,we conduct research in the following three specific areas to address the aforementioned issues:1)Existing pre-trained models have difficulty in focusing on and grasping key semantics when processing longer texts,and even slight perturbations can change the prediction results.Adversarial examples,as a challenging data augmentation technique,have shown good performance in solving the above problems.Therefore,we design a bi-granularity adversarial training method to generate high-quality adversarial examples,effectively improving the model’s ability to extract key semantics in long texts.2)Question paraphrase identification faces the problem of difficult examples resolution for "homogeneous but isomerous" and "heterogeneous but isomorphic" cases.Traditional challenging data augmentation techniques have problems such as low time efficiency and poor quality.Therefore,we design a generative challenging data augmentation method,significantly improving the efficiency and quality of example generation.Moreover,this paper provides a training method for a "wrong cases bank," which effectively improves the model’s performance by extrapolating from difficult instances.3)Existing pre-trained language model methods do not effectively utilize the query intention information of questions in solving the problem of question paraphrase identification.This causes the model to ignore the intention information and be easily confused by literal similarities or differences.Therefore,we design a variational autoencoder-based query intention-aware model,which effectively enhances the model’s perception of query intention.Through the perspectives of adversarial training,generative boosting,and query intention,this paper alleviates the problems of poor key semantic extraction ability and query intention perception ability in the field of question answer matching.Experimental results on publicly available datasets including WPQA,TrecQA,LCQMC,BQ,and QQP demonstrate the effectiveness of our proposed methods. |