The dialogue model based on deep learning has made a breakthrough,but there are still some problems,such as generic replies,lack of personalized replies and so on.The dialogue system based on reinforcement learning learns the optimal dialogue strategy by interacting with the user,so as to improve the performance of the dialogue system.In the aspect of algorithm,the REINFORCE algorithm is improved,and the problem of long training time of generative dialogue model is improved.From the performance of the dialogue system,the dialogue system to solve these problems.The specific research contents can be summarized as follows:(1)In order to increase the diversity of response,diversity cluster search is used as decoder,and self-evaluation sequence training is used to reduce the high variance of strategy gradient.The improved REINFORCE algorithm uses only one network in the training process compared with Advantage Actor Critic algorithm,which also saves the time of model training.Various types of filters are designed in the Data pre-processing stage to allow the corpus to be explored in a variety of ways.Based on the analysis of the results of manual and system evaluation,the dialogue system performs well in response diversity,and the improved reinforcement learning algorithm solves the general response problem and security response problem to some extent..(2)In order to solve the problems such as universal reply and time-consuming model training,when the dialog system generates replies,the author puts forward some suggestions.In order to increase the diversity of response,diversity cluster search is used as decoder,and self-evaluation sequence training is used to reduce the high variance of strategy gradient.The improved REINFORCE algorithm uses only one network in the training process compared with Advantage Actor Critic algorithm,which greatly reduces the complexity of the network model.In order to diversify and explore the corpus,several types of filters were designed in the Data pre-processing stage,and the experimental time was recorded.(3)In this paper,an attention-based hierarchical recursive encoder decoder model is used to solve the problem of anti-personalized responses in current dialog systems.Userspecific information during a conversation is often valuable because it directly relates to the content and style of the user’s responses,which can further affect the chat process and the user experience.The hierarchical recursive encoder decoder network model can decompose the conversation into two levels,which fully considers the long-term background and the specific information of users.Compared with the current RL model,the dialogue quality of RL-ahred model has been improved obviously. |