Font Size: a A A

Improvement And Research On Progressive Algorithm For Beinforcement Learning

Posted on:2023-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:G J WuFull Text:PDF
GTID:2558306629974669Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,deep reinforcement learning,which combines deep learning and reinforcement learning,has made remarkable achievements in the field of artificial intelligence.The deep reinforcement learning method not only uses the powerful representation ability of a deep neural network but also uses the autonomous decision-making ability of the reinforcement learning algorithm.It has shown strong universality in many learning fields and achieved good results.Continuous control tasks are a hot research field.In the deep reinforcement learning algorithm,the deep deterministic strategy gradient algorithm based on deterministic strategy gradient and actor-critic architecture is usually used.In the face of large-scale state-space tasks,the single actor-network in the deep deterministic strategy gradient algorithm is difficult to deal with,and there are problems such as blind exploration and maximization deviation.In this paper,the depth deterministic strategy algorithm is studied in the following three aspects:i.The single actor-network used in the deep deterministic strategy gradient algorithm is difficult to deal with the complex state space so the actor-network learning will be affected by different states.To solve this problem,a progressive multi-agent depth deterministic strategy gradient algorithm based on K-means clustering is proposed.In the training process,for the current state at each time step,when selecting the action,the algorithm selects the corresponding actor-network according to the K-means discrimination result.At the same time,to increase the effectiveness of the algorithm,the number of K-means cluster and the number of actor-networks are gradually increased with the increase of training time steps.The algorithm is applied to the mujoco simulation platform.The experimental results show that the algorithm has good results in most continuous control tasks.ii.Deep deterministic strategy gradient algorithm has good results for some simple continuous action space tasks,but when the state space of the task tends to be complex,a single actor-network is difficult to deal with,and there are some problems such as non-optimal action and catastrophic forgetting.Although the above(i)algorithm can effectively solve this kind of problem,in the(i)algorithm,the time complexity and training cost of K-means clustering and discrimination are large.A large number of experimental results show that the state space of almost continuous action space tasks meets the synchronous change of state and training time step during training.Using this idea,based on the above(i),replacing K-means clustering,discrimination,and other operations with time steps can effectively reduce the time complexity.In addition,excellent experience is added to guide the selection of actions to avoid blind exploration.Combining the two,a progressive multi-agent depth deterministic strategy gradient algorithm based on experience guidance is proposed.Experimental results show that the algorithm has low time complexity and excellent effect.iii.Classified empirical playback method can solve the problems of insufficient use of empirical samples and random sampling in-depth deterministic strategy gradient algorithm.The classified experience playback method compares the experience samples with the classification standards,then stores the experience samples into the corresponding experience buffer pool according to the comparison results,and then extracts different proportions of experience samples from different experience buffer pools for training according to the needs.This method can make full use of empirical samples.At the same time,due to classified storage,the correlation between different empirical samples is also weakened.However,the classified experience playback method fixes the number of experience pools.At the initial stage of training,the number of experiences in each experience pool increases slowly due to experience classification,so it is difficult to make effective use of experience training.Aiming at this kind of problem,combining the classification experience playback method with the progressive method,a depth deterministic strategy gradient algorithm based on the progressive classification experience playback is proposed.Compared with the simple classification empirical playback method,the experimental results show that the algorithm has a better effect in most continuous control tasks.
Keywords/Search Tags:Reinforcement learning, deep reinforcement learning, experience guidance, classification experience replay, deterministic policy gradient
PDF Full Text Request
Related items