In recent years,the amount of information in the network has exploded,and the recommendation system has become an important tool for users to obtain information because it can help filter a large amount of invalid content.Traditional recommendation systems rely on a large amount of historical behavior data to analyze user interest space.The conversational recommendation system asks questions about user preferences in a targeted manner.Compared with the traditional recommendation system,the user’s realtime preferences are obtained through active interaction,so that the recommended items are interpretable and get rid of the strong dependence on the user’s historical behavior data to make up for the information asymmetry problem between the users.The common conversational recommendation algorithm mainly establishes two sub-modules of dialogue task and recommendation task.The current related algorithm mainly processes these two modules independently for asynchronous training optimization,which makes the information between the two sub-modules not shared and the user is asked according to the formula.The model cannot adjust the dialogue inquiry strategy according to the existing information,which makes the dialogue process tend to be lengthy and the recommendation inquiry provided is vague.In order to solve the above problems,this paper proposes a multi-layer reinforcement learning model.Considering the complexity of the dialogue recommendation task,it is divided into three subtasks,mainly including the high-level target selection task,the middlelevel item recommendation task and the bottom-level dialogue task.The high-level inquiry task decides whether to inquire about the attribute or the final recommendation.By combining the reward function that decreases with the number of rounds,the constraint model minimizes the number of recommendation rounds while ensuring the accuracy of the recommendation.Afterwards,the recommendation task and the dialogue task are sequentially linked to each other.Use the information to help the model better acquire and use the information to give more accurate recommendation query results and reduce the number of dialogue rounds to effectively improve the user experience.At the same time,due to the sparse user data in the dialogue recommendation scene,it is difficult to obtain an accurate user vector representation,which limits the effect of dialogue and recommendation models.The graph structure can extract a lot of implicit information to help enrich the model’s understanding of users and content,and generate more accurate vector representations of users and content.However,in conversational recommendation,it is difficult to obtain the preferences expressed by users in the current dialogue because the user’s tags in a single round of dialogue are very sparse.In order to solve the above problems caused by the scarcity of user labels,this paper proposes a graph model integrated into self-supervised learning.By using two kinds of neural networks that extract users’ historical preference information and real-time preference information,construct positive and negative samples to generate selfsupervised signals,which help In the absence of user labels,the model learns the preference information expressed by the user in the current conversation and generates a more accurate representation vector.In order to verify the effectiveness of the algorithm proposed in this paper,a comparative experiment was carried out on two real datasets,ReDial and INSPIRED.Compared with the mainstream algorithm,the multi-layer reinforcement learning conversational recommendation model proposed in this paper improves the Recall@50 index by 9.1%,and the Dist-3 index increased by 13.2%,and the BLEU-1 index increased by 5.9%.Compared with the multi-layer reinforcement learning method,the method integrated with self-supervised learning increased the Recall@50 index by 4.1%,and the Recall@50 index increased by 7.5%in the cold start scenario.The proposed multi-layer reinforcement learning model can better adapt to the interactive dialogue strategy with users in conversational recommendation.Through the selfsupervised learning algorithm,it can obtain more accurate representation in sparse user information,and has achieved better effect in both recommendation and dialogue. |