Font Size: a A A

Research On Reninforcement Learning Network Algorithm With Self-adaptive Basis Function

Posted on:2019-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y T WangFull Text:PDF
GTID:2428330551958020Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning is one of the most important part in machine learning,in which agent learns how to act by interacting with environment.Reinforcement learning is a process of seeking the optimal policy.Policy evaluation is a main method in reinforcement learning.It is an assessment of given policy,which is based on value function.In general,value function is estimated by linear parameterized function approximation.Previous algorithm improves their accuracy by tuning the weights of network regardless of basis function in network.In fact,basis function in value function approximation has a significant impact on ability of algorithm.Centers of basis function can be determined by problems to be solved.However,it is difficult to set the widths in basis function.Scholars usually do many experiments to test the proper value of widths.In this research,the architecture of neural network is utilized to realize reinforcement learning algorithm.In the proposed algorithm,the widths in basis function can be tuned automatically.The main content in this paper is as follows:1.Adaptive RC-network is proposed in this research.In the proposed algorithm,parameters of basis function are tuned automatically until to the optimal in value function approximation,especially for the widths of basis function.TD error and value function are estimated by value function approximation and RC reinforcement learning.Meanwhile the TD error is back propagated to update parameters in basis function,that is,widths and weights.In this way,the proposed algorithm can achieve the optimal performance during learning adaptively.Experimental results and theoretical analysis proved that this algorithm has better performance in both policy evaluation and policy iteration.For policy evaluation,proposed algorithm can approximate the value function more accurately and steady.For policy iteration,comparing with traditional methods,it takes less steps to achieve the goal through more steady process in learning control problems.2?Adaptive iLSTDC-network is proposed in this research,which confirms practicable of the network architecture.In this algorithm,value function and TD error are approximated by a network.TD error is then back propagated to update the widths of basis function and network weights.In this way,the network is optimized gradually by tuning parameters.Finally,the optimal network is employed to policy evaluation and policy iteration.Experiments results showed that this algorithm has a better performance.In the meantime,computation time in each step dramatically reduced.
Keywords/Search Tags:reinforcement learning, policy evaluation, policy iteration, value function, function approximation, basis function
PDF Full Text Request
Related items