Font Size: a A A

Research And Application Of Dialogue Management Via Deep Learning

Posted on:2021-01-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y LeiFull Text:PDF
GTID:1368330605481301Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Human-machine dialogue has a huge market requirement such as navigation,intelligence appliance and customer service,which has been an active research area in recent years.Human-machine dialogue system streams three research areas:question-answering,chat-bot and task-oriented dialogue system.Task-oriented dialogue system aims to help users achieve specific tasks,such as restaurant reservations,hotel reservations,and airline reservations,through multi-turn interactions.In task-oriented dialogue system,dialogue management infers dialogue state from natural language understanding(i.e.dialogue state tracking),and provides appropriate action for natural language generation in responding to user utterance(dialogue policy).Therefore,dialogue management play a vital role in task-oriented dialogue system.Traditionally,above modules are built separately and are connected as pipeline to construct the dialogue system.More recently,the dialogue management community presents an end-to-end framework to integrate dialogue management with natural(spoken)language understanding by the way of deep neural network and reinforcement learning,which can alleviate the problem of error accumulation.Despite the promising results,there are still many challenges with this approach.Such as,there is no end-to-end model to deal with zero-shot adaptive transfer problem for the dialogue state tracking.Existing end-to-end dialogue management based on reinforcement learning can only be applied to the dialogue tasks with known ontology.However,there are many free-form slots with infinite slot value space in real scenario.Due to sparse rewards,most studies use the two-stage training approach which the model is fine-tuned with reinforcement learning after supervised learning pre-trained.The development of user simulator for reinforcement learning requires the developer to carefully design its response strategy.This paper introduces a series of related studies to alleviate above issues.The specific contents and main contributions are summarized as follows:Based on machine reading comprehension,we introduce two dialogue state trackers,free-form and categorical,to handle zero-shot adaptive transfer issues as an end-to-end manner,where the slot description,dialogue context and slot values are treated as question,passage and answers respectively.For free-form slots,the proposed model points out the slot values from utterances as sequence labelling method to track the user goals.For categorical slots,the proposed model ranks the possible values with last two utterances to track the user goals.Compared with the baseline model on SGD datasets,results show that our proposed models can deal with some zero-shot service or domains effectively.We propose a word-based partially observable markov decision processes dialogue management model,which obtained the free-form slots via slot filling.The proposed model is modelled with hierarchical recurrent neural network.The bottom recurrent neural network is modelled for slot filling with user utterance.The top recurrent neural network is modelled for implicit dialogue state tracker,where utterance representation and system action representation are utilized to update dialogue state.A multi-layer perceptron is modelled for dialogue policy based on upstream state representation.The proposed model can output the action and slot values simultaneously.Compared with existing work,the proposed model can work on the tasks with free-form slots and the slot filling labels are obtained through lexicalizing pre-defined templates,which is more convenience to be accessed than state tracking labels.We proposed a high-return prioritized experience replay algorithm,which ensures that the dialogue policy can be optimized from scratch by reinforcement learning.The algorithm simulates a certain number of dialogues before each training and judges the success of the dialogues according to the reward signal.In training stage,successful dialogue sequences with high-return are sampled,and some random dialogue sequences are sampled to prevent the model from converging to local optimality.Experiments show that,compared with the existing experience replay,the high-return prioritized experience replay can accelerate the convergence effectively for reinforcement learning in sparse dialogue tasks.We propose a multi-agent dialogue model where an end-to-end dialogue manager cooperates with a user simulator to fulfill the dialogue task.Since user simulator is treated as one agent in multi-agent,the simulator policy can be optimized as automatic manner rather than laboring development.In addition,for user simulator reward function,we use the reward shaping technique based on the adjacency pairs in conversational analysis to make the simulator learn real user behaviors quickly.Finally,we generalize the one-to-one learning strategy to one-to-many learning strategy where a dialogue manager cooperates with various user simulators to improve the performance of trained dialogue manager.Compared with multi-agent dialogue model without the constraints,multi-agent dialogue model trained with adjacency pairs constraints can converge faster and avoid derivation from normal human-human conversation.The experimental results also show that the dialogue manager trained with one-to-many strategy achieves the best performance in cross-model evaluation with human users involved.We develop a Chinese conference booking dialogue system based on the above dialogue management models.The system provides an interactive platform for users to book conference rooms in natural language.
Keywords/Search Tags:deep learning, reinforcement learning, dialogue management, user simulator, multi-agent
PDF Full Text Request
Related items