With the development of artificial intelligence technology,the research of natural language processing technology has been promoted,and the application of task-oriented dialogue system has also been promoted.At present,task-oriented dialogue system has been widely used in smart homes and smart customer services,bringing great convenience to people’s lives.Among them,the dialogue policy learning in the taskoriented dialogue system is an important module.It determines the dialogue actions that the system feeds back to the user and directly affects the performance of the dialogue system.Therefore,the study of dialogue policy learning in task-oriented dialogue system is of great significance.At present,the main problem faced by dialogue policy learning is that dialogue scenarios are becoming more and more complex,and the difficulties of dialogue samples used for policy learning training are very different.This leads to different learning difficulties for different dialogue samples when the system learns dialogue policy.Therefore,it is necessary to model the complexity of dialogue samples and carry out scheduled policy learning training.In addition,the input of dialogue policy learning comes from the results of dialogue state tracking,so the performance of dialogue state tracking also affects dialogue policy learning.In addition,current dialogue policy learning methods always based on reinforcement learning,a simulated user will be established as a training environment,so the quality of the user simulator also affects the performance of dialogue policy learning.This paper has carried out a series of research on the above challenges.The specific research content and innovations are summarized as follows:(1)We propose a dialogue policy learning model based on curriculum learning(Scheduled Dialog Policy Learning,called SDPL),which is used to deal with the current increasingly complex dialogue scene data,and solve the different learning difficulties brought by dialogue samples of different learning complexity to the system.We introduce curriculum learning into dialogue policy learning,and improve the performance of dialogue policy learning through a scheduled training process,so that the system first learns from simple dialogue samples,and then gradually learns more complex dialogue samples.We designed an automatic method to evaluate the complexity of dialogue samples,and based on this evaluation method,we proposed an automatic curriculum learning framework to improve the performance and efficiency of dialogue policy learning.(2)For the task of dialogue state tracking,we propose an optimization model(Gated Attentive Convolutional Network Dialogue State Tracker,called GAC)for dialogue state tracking based on gated convolutional networks to solve the problem of long-sequence modeling in dialog state tracking.We use the gated attention convolutional network as the encoder to obtain the representation of the long dialogue sequence.At the same time,we also introduce the dialogue history information into the convolutional network encoder,supplementing the shortcomings of only the current turn of dialogue information as input,and the above-mentioned gated attention convolutional network encoder is used to process long dialogue sequence information after introducing historical information.(3)Aiming to improve the dialogue policy learning environment,we optimized the user simulator based on the adversarial network(Adversarial User Simulator,called AUS)to improve the quality of the user simulator and indirectly improve the performance of dialogue policy learning.The user simulator learns from real user experience through adversarial learning,thereby narrowing the gap between simulated user experience and real user experience.We are inspired by the generated adversarial network and introduce a discriminator into the model to distinguish the simulated experience from the real user experience,thereby improving the quality of the simulated experience generated by the generator.In addition,this paper introduces a memory module into the generator to make full use of the historical information of the dialogue,making the modeling of the user simulator more adequately. |