Research And Application Of Dialog Policy Module Based On Multi-Agent Reinforcement Learning

Posted on:2024-08-15

Degree:Master

Type:Thesis

Country:China

Candidate:S Zhang

Full Text:PDF

GTID:2568306914482604

Subject:Intelligent Science and Technology

Abstract/Summary:

PDF Full Text Request

Task-oriented dialog systems are widely employed in daily life to complete specific tasks.Dialog policy module guides the direction of the conversation,determines the user experience and the success of the dialog.It is an important component of task-oriented dialog systems.Existing research about dialog policy module often trains the module separately or trains the module with other modules of the dialog system synchronously.The former ignores the influence of other modules on the dialog policy,resulting in a lack of fault tolerance for other modules.The latter is unstable due to the mutual influence between modules during training.To address these issues,this paper conducts research on dialog policy module based on multi-agent reinforcement learning(MARL)after extensive investigation.The specific work is as follows:An asynchronous reinforcement learning framework is proposed to train task-oriented dialog system.Asynchronous updates refer to using different update frequencies for the dialog policy module and other modules during joint training.The framework has the following characteristics:Firstly,the dialog policy module is integrated into the entire dialog system for modeling and training,and the dialog state tracking module and dialog policy module are updated asynchronously,alleviating the mutual influence between different modules.Secondly,curriculum learning is introduced to adjust the training samples and process of the dialog state tracking module.Based on the accuracy of the dialog state tracking module under different user actions,the user actions are divided into three levels from easy to difficult,and targeted training is conducted for samples that are difficult for the model to learn.Thirdly,a user model is constructed to assist the training of the dialog system,which improves the diversity of users and the diversity of dialog data collected by the system,making the training more sufficient.At the same time,the existing reward design is improved,and both the user model and the dialog system model are trained using reinforcement learning,which improves the accuracy of the user model while training the dialog system.This paper further constructs a dataset for collecting phone numbers,which has the characteristics of sub-slot based task-oriented dialog task and is more realistic than existing dialog data,with more complex dialog actions,making it the most action-rich single-domain task-oriented dialog dataset.Experimental results on this dataset show that the proposed method has better performance than existing typical reinforcement learning methods.Finally,a dialog system application is designed and implemented based on the dialog policy model and training method proposed in this paper.The system has been partially deployed in an actual customer service system for testing,and the results show that the system can effectively complete the task of collecting complex user phone numbers through multi-turn dialog.

Keywords/Search Tags:

Task-oriented dialogue system, Reinforcement learning, Asynchronous update, Curriculum learning, Reward

PDF Full Text Request

Related items

1	Research On Dialogue Policy Learning In Task-oriented Dialogue System
2	Research On The Key Technology Of Task-Oriented Dialogue Policies Based On The Deep Reinforcement Learning
3	Research On Key Technology And Application Of Task-oriented Dialogue System
4	Research On Task-based Dialogue Strategy Based On Reinforcement Learnin
5	Research On Task-oriented Dialogue Policy Based On Deep Reinforcement Learning
6	Research And Application Of Self-dialogue In Dialogue Systems Based On Reinforcement Learning
7	Research And Implementation Of Task-Oriented Dialogue System For Government Affairs
8	The Study And Application Of Task-oriented Dialogue System Based On Multi-round Interaction
9	Towards Multi-Document Driven Task-Oriented Dialogue
10	Research On Task-Oriented Dialogue System In Instrument Field