Font Size: a A A

Multi-domain Dialog Policy Learning Based On Multi-agent Reinforcement Learning

Posted on:2023-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:L TangFull Text:PDF
GTID:2558307154974499Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Task-oriented dialogue system aims to help people complete tasks such as air ticket-booking and hotel-booking,where one of the most important module is dialog policy.The performance of dialog policy determines the success of human-machine dialogue system.The goal of dialog policy is to guide the conversation and help users achieve their goals based on the demands of users and dialogue history information.Recently,reinforcement learning is the main method to learn dialog policy.Dialogue process is regarded as a sequential decision-making process in reinforcement learning,and the key information in the dialogue history is represented as dialog state.Reward function is used to evaluate the dialogue and the decision.Dialog policy that can maximize the expected reward is obtained by exploring in the dialog state and dialog action space.The different scenario involved in the dialogue is called domain.When the number of domains involved in the dialogue increases,the dialog state space and action space will also increase sharply,resulting in difficulties in policy exploration in the reinforcement learning model,and difficulties to get the best dialog policy in limited interactive training.In addition,another problem of multi-domain dialog policy learning is the sparse dialogue data.It is difficult to collect a large number of dialog corpora in some domains,which makes it difficult for dialog policy in the domains to be fully trained,and finally affects the whole dialog policy model.In order to study the dialog policy in multi-domain scenario,The main work and contributions of this paper are as follows:Firstly,aiming at the difficulties of policy exploration in huge dialog state and action space of multi-domain dialogue,this paper proposes to use multi-agent reinforcement learning to model dialog policy in multi-domain scenarios.In reinforcement learning,agent usually represents a policy model that gives decision-making actions.This method partitions the original dialog state and action space into a number of smaller state and action spaces according to specific domains.For each domain,a specific agent policy is trained to make decisions for each turn involving this domain.Compared with the baseline,experiments on the multi-domain dialogue dataset Multi WOZ show that the dialog success rate using this method has increased from 55.0 % to 67.2 %.Second,after the partitioning,there are omly few dialogue data in some domains,which will cause difficulty to fully train the agent policy in these domains.Based on the above multi-agent reinforcement learning,transfer learning is further used to solve the problem of sparse corpus in multi-domain dialogue.Firstly,this method trains a general policy module using the dialogue corpus of all domains,and modifies the policy network according to the action space of each domain,then uses a small amount of data in each domain to fine-tune the policy module to better adapt to the ontology knowledge of each specific domain.Compared with the previous method,experiments on multi-domain dialog dataset multiwoz show that the dialog success rate of this method is further improved from 67.2 % to 76.4 %.
Keywords/Search Tags:Task-oriented dialogue system, Dialog policy learning, Multi-agent reinforcement learning, Transfer learning
PDF Full Text Request
Related items