Font Size: a A A

Research On Stability Evaluation Of A3C Based On Skewness And Sparseness

Posted on:2022-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2518306563463114Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning has been applied in many fields such as information theory,robotics,automatic control and unmanned driving.Through the in-depth research on reinforcement learning,some reinforcement learning algorithms have been proved to be capable of solving complex problems.Reinforcement learning algorithms are often faced with complex and changeable application scenarios,the stability of the algorithm directly affects the actual running results of the algorithm.At present,the stability evaluation of reinforcement learning algorithms has achieved good results.Most of the existing work evaluates the algorithm stability by applying different kinds of algorithm attacks and modifying the algorithm's hyperparameters.However,the focus of the above work is not on the algorithm running process itself,but on changing the external and internal environment of the algorithm running,without measuring the stability of the normal running training process of the algorithm itself.In view of the above deficiencies,this paper proposes two stability assessment methods based on A3C algorithm,the mainstream of reinforcement learning.The main research contents and contributions are as follows:(1)For the finite state space scenario,a static evaluation method of stability based on time reference is proposed.Through the complete characterization of the finite state space,the data sampling of the deviation matrix skewness of the action probability and the sparsity of the difference matrix in the training process is realized according to five sampling intervals,and the skewness vector and the sparsity vector are comprehensively regularized calculations with different ! values,and the stability " score of the algorithm is obtained,which is used as a mathematical measure of stability.Finally,the ranking results are compared with the actual convergence of the model based on expert experience,and the accuracy of the static stability assessment is obtained.(2)For infinite state space scenarios,a dynamic stability evaluation method based on difference coefficients is proposed.The infinite state space is transformed into a finite state space by using equal interval sampling,under five different initial sampling intervals,the change of the difference coefficient of the action probability deviation matrix at adjacent sampling moments is observed,and the sampling interval is dynamically changed to complete data sampling.According to the comparison result of the stability "score and the true convergence of the model,the accuracy of the dynamic evaluation is obtained.For the above methods,this paper uses the gym-maze pathfinding game and the Mountain Car hill climbing game to verify the experiment.In the process of experiment,this paper set several groups of learning rate and agent number,combined with different! calculation values,comprehensive stability assessment of A3C algorithm,get the best corresponding algorithm stability of a group of learning rate and agent number.In the static evaluation work,the highest accuracy rate of stability evaluation reaches 50%.In the dynamic evaluation work,the accuracy rate of stability evaluation reached the highest83.3%,which fulfilled the research objective of this paper.
Keywords/Search Tags:Reinforcement Learning, Asynchronous Advantage Actor-Critic Algorithm, Stability Assessment, Skewness, Sparseness
PDF Full Text Request
Related items