Research On Task-oriented Dialogue Policy Based On Deep Reinforcement Learning

Posted on:2022-03-15

Degree:Master

Type:Thesis

Country:China

Candidate:K Yin

Full Text:PDF

GTID:2518306569975869

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Dialogue policy is a submodule of dialogue management in task-oriented dialogue system,its main task is to decide the next system action according to current dialogue state.Due to the nature of sequential decision making in dialogue systems,a large number of researches have focused on dialogue policy through reinforcement learning.However,the dialogue policy model based on reinforcement learning still has the following problems:(1)Positive reward in task-oriented dialog system is very sparse,it is hard for the system to acquire successful dialogue samples and positive reward which can learning more efficiency.In order to establish an effective policy,the system needs to have a lot of conversations with users or user simulator,so the convergence of policy is slow;(2)For dialog tasks with a large scale of system actions,it's difficult for the policy model to explore during learning,system usually select wrong action when make decision,leading to a low success rate of the final policy.To solve problem(1),this paper proposes a policy learning method that integrates reverse curriculum and goal distance reward shaping.The intermediate difficulty state of dialogue tasks is defined combining with the user goal,and plans the learning process of the system to ensure that the system can quickly obtain positive rewards for learning.At the same time,the calculation method of goal distance is defined and a goal distance evaluation model was introduced to predict the distance from current state and successful state,and additional reward was added through the distance difference between adjacent dialogue states for reward shaping.Experimental results in two task-oriented dialogue tasks show that this method can effectively alleviate the reward sparse problem and improve the efficiency of policy learning.In view of problem(2),we analyze those failed dialogues,found the situation that system usually answers what the user not questioned before or the system did not respond to user's question in time often occurs.In view of this,we introduce a dialogue policy learning model based on action space decomposition.A top-level abstract action space is extract based on the original mass action space,then two Q function is learned respectively.Because the top abstract space is smaller,the corresponding Q function can be learned faster,so it can help system to make right decision,decrease decision difficulty and increase dialog success rate.Experimental results on two task-oriented dialogue tasks show that this method can effectively improve the success rate of dialogue.

Keywords/Search Tags:

Task-oriented Dialogue System, Dialogue Policy, Reinforcement Learning

PDF Full Text Request

Related items

1	Research On Key Technology And Application Of Task-oriented Dialogue System
2	Sample Augmentation Based Reinforcement Learning For Dialogue Management
3	Research And Application Of Dialogue Management In Task-based Dialogue System
4	Research On End-to-End Task-oriented Dialogue System Based On Deep Learning
5	Research On Task-Oriented Dialogue System In Instrument Field
6	The Study And Application Of Task-oriented Dialogue System Based On Multi-round Interaction
7	Research And Application Of Self-dialogue In Dialogue Systems Based On Reinforcement Learning
8	Research And Implementation Of Task-Oriented Dialogue System Based On Pipeline Method
9	Research On End-to-end Task-oriented Dialogue System Based On Deep Learning
10	Research And Implementation Of Task-oriented Dialogue System For Multi-domains