Font Size: a A A

Sample Augmentation Based Reinforcement Learning For Dialogue Management

Posted on:2020-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:K T LuFull Text:PDF
GTID:2428330572988155Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Goal-oriented dialogue system has been widely applied on Chat-bots and personal assistants,such as Apple Siri,Microsoft Cortana and Google Home.Dialogue policies,playing an important role for well responding to users,are usually learned by reinforce-ment learning(RL).With the help of RL,a goal-oriented dialogue system can gradually achieve a good policy through interacting with users.However,RL approaches often need too many dialogue interactions before getting a good dialogue policy,which not only needs more training cost,but decreases user satisfaction since a bad policy in the early learning phrase.So,it is crucial to improve the dialogue policy learning rate in the limited dialogue interactions.To accelerate the dialogue policy learning rate,this paper proposes a new dialogue data augmentation approach which generates many successful dialogue samples from failed ones.Our approach significantly increases the number of successful dialogue samples in the limited interactions and thus accelerates the dialogue policy learning rate.We propose two data augmentation approaches,called trimming-based hindsight experience replay(T-HER)and stitching-based hindsight experience replay(S-HER).T-HER generates a successful dialogue sample by trimming a dialogue segment from a failed one,which completes some simple tasks.S-HER generates a successful dialogue by stitching two different dialogue segments coming from one failed dialogue sample and one successful dialogue sample.We also develop an automated S-HER,which can generate new dialogue samples with different qualities by automatically adjusting the stitching strategies during the dialogue training procedure.Dialogue samples generated by T-HER are usually quite short,and thus can help a goal-oriented dialogue system to learn simple tasks first,whereas dialogue samples generated by S-HER are as long as real samples,which promote reinforcement learning to learn how to finish the complete tasks with the help of more complete training samples.We have implemented our ap-proaches in a widely used dialogue platform TC-Bot,and evaluated the efficiency and effectiveness of two approaches.Compared with a baseline,our methods significantly accelerate the dialogue policy learning rate.To combine our approaches with prioritized experience replay,the dialogue policy learning rate can be further accelerated.
Keywords/Search Tags:Goal-oriented dialogue system, reinforcement learning, dialogue policy, data augmentation
PDF Full Text Request
Related items