| As a main branch in machine learning,reinforcement learning learns the optimal control strategy from the interactive data between the agent and the environment.Calculating the derivative of the strategy parameter by the objective function has been the dominant direction for solving the reinforcement learning problem,and evolution-based reinforcement learning algorithms have been emerging in recent years.Compared with the gradient algorithm,on the one hand,the evolutionary algorithm does not need to calculate the gradient,which shortens the training time;on the other hand,the evolutionary algorithm can be well parallelized and run more efficiently.However,although the evolutionary algorithm can complete the training in a short time,the training process requires much more interaction with the environment than the gradient reinforcement learning algorithm.For reinforcement learning problems,the interaction with the environment requires a certain cost,especially in the application of real problems,such as reinforcement learning applied to robot manipulation,the probability of model failure at the beginning of training is very high,and robots are likely to appear damage of robots or other consumption.Therefore,we hope to reduce the number of interactions with the environment by improving the reinforcement learning algorithm,or get better performance with the same number of interactions.Both of the work in this paper are based on the idea of negatively correlated search,using the characteristics of negatively correlated search to simultaneously search multiple different regions of the target space and the diversity provided for evolution at the search behavior level to improve the performance of algorithms.In the first work of the dissertation,we combined the negatively correlated search with the natural evolution strategy algorithm,and proposed the negative correlated natural evolution strategy algorithm NCNES.The basic design of the NCNES algorithm is based on the natural evolution strategy algorithm framework.Based on the idea of negatively correlated search,we design an objective function that takes into account both solution quality and diversity,and derive the natural gradient of the objective function.To verify the performance of NCNES,we experimented on arcade games using the gradient reinforcement learning algorithm A3 C,the PPO and the evolutionary reinforcement learning algorithm CES as comparison algorithms.The results show that the NCNES algorithm has shown competitive performance in the three games Enduro,Freeway and Beam Rider,while the performance is more stable during training.Our second work is an algorithm called NCSCC,which based on cooperative coevolution framework and negatively correlated search algorithm.Excessive amount of evolutionary parameters is a major factor that affects the performance of evolutionary algorithms.In reinforcement learning,for simple arcade games,such as Breakout,strategy model commonly uses three-layer convolutional neural networks and two fully connected layers,and the total number of parameters exceeds one million.When evolutionary algorithms are used to evolve the parameters of the strategy model,too many parameters will bring about "dimensional disaster".To address this problem,we use the cooperative co-evolution framework to group parameters,optimize only one group at a time,and finally integrate the optimization results of all groups;at the same time,we modify the evaluation of the partial solution in the cooperative co-evolution framework to make it more suitable for negatively correlation search algorithms.The noise in the reinforcement learning problem will mislead the search direction and affect the effect of the elite strategy;therefore,in the process of training the strategy model,we use the results of multiple evaluations as the fitness value.The experimental results show that our proposed NCSCC algorithm is not weaker than the gradient reinforcement learning algorithm,and is significantly stronger than the typical evolutionary reinforcement learning algorithm CES.At the same time,compared with the negatively correlated search algorithm without the cooperative collaborative evolution framework,the NCSCC algorithm performans better,confirming that the cooperative collaborative evolution framework has a certain mitigating effect on the "dimensional disaster". |