Font Size: a A A

Research On Task-oriented Dialogue Policy Based On Deep Reinforcement Learning

Posted on:2022-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:K YinFull Text:PDF
GTID:2518306569975869Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Dialogue policy is a submodule of dialogue management in task-oriented dialogue system,its main task is to decide the next system action according to current dialogue state.Due to the nature of sequential decision making in dialogue systems,a large number of researches have focused on dialogue policy through reinforcement learning.However,the dialogue policy model based on reinforcement learning still has the following problems:(1)Positive reward in task-oriented dialog system is very sparse,it is hard for the system to acquire successful dialogue samples and positive reward which can learning more efficiency.In order to establish an effective policy,the system needs to have a lot of conversations with users or user simulator,so the convergence of policy is slow;(2)For dialog tasks with a large scale of system actions,it's difficult for the policy model to explore during learning,system usually select wrong action when make decision,leading to a low success rate of the final policy.To solve problem(1),this paper proposes a policy learning method that integrates reverse curriculum and goal distance reward shaping.The intermediate difficulty state of dialogue tasks is defined combining with the user goal,and plans the learning process of the system to ensure that the system can quickly obtain positive rewards for learning.At the same time,the calculation method of goal distance is defined and a goal distance evaluation model was introduced to predict the distance from current state and successful state,and additional reward was added through the distance difference between adjacent dialogue states for reward shaping.Experimental results in two task-oriented dialogue tasks show that this method can effectively alleviate the reward sparse problem and improve the efficiency of policy learning.In view of problem(2),we analyze those failed dialogues,found the situation that system usually answers what the user not questioned before or the system did not respond to user's question in time often occurs.In view of this,we introduce a dialogue policy learning model based on action space decomposition.A top-level abstract action space is extract based on the original mass action space,then two Q function is learned respectively.Because the top abstract space is smaller,the corresponding Q function can be learned faster,so it can help system to make right decision,decrease decision difficulty and increase dialog success rate.Experimental results on two task-oriented dialogue tasks show that this method can effectively improve the success rate of dialogue.
Keywords/Search Tags:Task-oriented Dialogue System, Dialogue Policy, Reinforcement Learning
PDF Full Text Request
Related items