Font Size: a A A

Research On Security Deep Reinforcement Learning Based On Experiences

Posted on:2020-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:W WuFull Text:PDF
GTID:2428330578477962Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Deep reinforcement learning combines the advantages of reinforcement learning and deep learning and has made great progress in improving the training effect of video games.However,the training process of deep reinforcement learning is inseparable from the large number of 'trial and error'.This process does not take the serious consequences of security risks into account.In reality,the cost of the agent is high.Blindly 'trial and error' greatly reduces the service life of the agent and even damages other devices in its interactive environment.In addition,deep reinforcement learning algorithms also have problems such as low efficiency of using samples,slow convergence,and poor training stability.This paper focuses on the security risk of deep reinforcement learning,and improves the deep reinforcement learning algorithm from the optimization model structure and algorithm improvement.The following three points are proposed:i.Upper Confidence Bound Deep Q-Network.Currently,the deep Q-network algorithm uses random pool samples for the experience pool samples.This method can't distinguish the importance degree of the sample.It is easy to miss the excellent sample during the training process,which leads to the slow convergence of the algorithm and makes the agent more dangerous.Aiming at this problem,upper confidence bound is introduced into the deep Q-network,which improves the utilization efficiency of the empirical samples,and verifies the effectiveness of the algorithm through Atari 2600 game experiments.ii.Dual Network Security Deep Reinforcement Learning Based on Function Restriction.Aiming at the security risk problem caused by the infinitely 'trial and error' of the intelligent body in deep reinforcement learning,a dual network security deep reinforcement learning is proposed.On the one hand,the new function is constructed by optimizing the criteria to limit the meaningless exploration.On the other hand,the high-value samples are fully trained by constructing a double-deep network.Experiments show that the improvement of the structure and algorithm of the model effectively reduces the number of times the agent enters the dangerous state during the training process.iii.Continuous Domain with Security Deep Reinforcement Learning Based on Priority Clustering.In the continuous state action space,the empirical samples obtained by the agent exploration are vectorized,and the traditional discretization deep reinforcement learning optimization algorithms cannot be used.Aiming at this problem,a continuous domain with security deep reinforcement learning based on priority clustering is proposed.The algorithm extracts the excellent samples in the experience pool through the clustering method,and improves the utilization efficiency of the samples through the priority algorithm.Experiments show that the algorithm effectively reduces the number of times the agent in danger and improves the stability of the algorithm in the deep reinforcement learning problem of continuous space.The three algorithms proposed in this paper effectively alleviate the security problem of the agent in deep reinforcement learning in the discrete and continuous action state environment.Finally,the effectiveness of the proposed algorithms is verified by the classic Atari 2600 games and MuJoCo games.
Keywords/Search Tags:reinforcement learning, deep reinforcement learning, security deep reinforcement learning, experience replay, continuous space
PDF Full Text Request
Related items