Sample Augmentation Based Reinforcement Learning For Dialogue Management

Posted on:2020-01-18

Degree:Master

Type:Thesis

Country:China

Candidate:K T Lu

Full Text:PDF

GTID:2428330572988155

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Goal-oriented dialogue system has been widely applied on Chat-bots and personal assistants,such as Apple Siri,Microsoft Cortana and Google Home.Dialogue policies,playing an important role for well responding to users,are usually learned by reinforce-ment learning(RL).With the help of RL,a goal-oriented dialogue system can gradually achieve a good policy through interacting with users.However,RL approaches often need too many dialogue interactions before getting a good dialogue policy,which not only needs more training cost,but decreases user satisfaction since a bad policy in the early learning phrase.So,it is crucial to improve the dialogue policy learning rate in the limited dialogue interactions.To accelerate the dialogue policy learning rate,this paper proposes a new dialogue data augmentation approach which generates many successful dialogue samples from failed ones.Our approach significantly increases the number of successful dialogue samples in the limited interactions and thus accelerates the dialogue policy learning rate.We propose two data augmentation approaches,called trimming-based hindsight experience replay(T-HER)and stitching-based hindsight experience replay(S-HER).T-HER generates a successful dialogue sample by trimming a dialogue segment from a failed one,which completes some simple tasks.S-HER generates a successful dialogue by stitching two different dialogue segments coming from one failed dialogue sample and one successful dialogue sample.We also develop an automated S-HER,which can generate new dialogue samples with different qualities by automatically adjusting the stitching strategies during the dialogue training procedure.Dialogue samples generated by T-HER are usually quite short,and thus can help a goal-oriented dialogue system to learn simple tasks first,whereas dialogue samples generated by S-HER are as long as real samples,which promote reinforcement learning to learn how to finish the complete tasks with the help of more complete training samples.We have implemented our ap-proaches in a widely used dialogue platform TC-Bot,and evaluated the efficiency and effectiveness of two approaches.Compared with a baseline,our methods significantly accelerate the dialogue policy learning rate.To combine our approaches with prioritized experience replay,the dialogue policy learning rate can be further accelerated.

Keywords/Search Tags:

Goal-oriented dialogue system, reinforcement learning, dialogue policy, data augmentation

PDF Full Text Request

Related items

1	Research And Application Of Self-dialogue In Dialogue Systems Based On Reinforcement Learning
2	Research On Task-oriented Dialogue Policy Based On Deep Reinforcement Learning
3	Research On The Key Technology Of Task-Oriented Dialogue Policies Based On The Deep Reinforcement Learning
4	Research On Dialogue Policy Learning In Task-oriented Dialogue System
5	Proactive Mixed-type Dialogue Systems
6	Optimizing Of Dialogue Policy In Human-computer Spoken Dialogue System Based On Reinforcement Learning
7	Research On Key Technology And Application Of Task-oriented Dialogue System
8	Research And Implementation Of Task-Oriented Dialogue System For Government Affairs
9	Research On Knowledge Driven Human-machine Active Dialogue Strategy
10	Research On End-to-End Task-oriented Dialogue System Based On Deep Learning