Font Size: a A A

Reinforcement Learning Algorithms For Semi-markov Decision Processes

Posted on:2019-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:J Y YangFull Text:PDF
GTID:2428330566998675Subject:Control engineering
Abstract/Summary:PDF Full Text Request
As the main algorithm in machine learning,reinforcement learning(RL)plays an important role in the fields of intelligent decision-making and behavior cognition.It is also an effective way to solve the stochastic sequential decision problem.Although the RL algorithms have made notable achievements in basic theory researches and applications,most researches use Markov decision processes(MDP)as the system's environmental model.However,many systematic environmental models can not be accurately described by MDP in practical applications.Semi-Markov decision processes(SMDP),which provides an effective model for describing the time factor,can effectively overcome these shortcomings and have extensive research prospects.Aim at improving the deficiency of current RL algorihtms in MDP,a unified analytical framework is proposed to extend RL algorithms from MDP to SMDP.Under average reward criteria,the framework derives the continuous time-type and discrete time-type Bellman optimality equations through the performance sensitivity analysis method.By using the iterative form of Berman optimality equation,SMDP RL algorithm's state action pair Q-value updating formula is obtained.Simulation results demonstrate the convergence of SMDP RL algorithms developed in this dissertation,and also verify the validity of the analytical framework.In addition,combining with incremental value iteration(IVI)algorithm and stochastic shortest path(SSP)value iteration algorithm,two new SMDP RL algorithms are obtained in this dissertation.Besides,the dichotomy method is introduced to obtain the different SMDP RL algorithms.The framework proposed in this dissertation can make the extension of SMDP RL algorithms simple and intuitive,and it also has a positive guiding effect on the research of RL algorithms.Through simulation experiments,we verify the convergence of algorithm and the effectiveness of strategy.Simulation results show that the convergent speed of algorithm obtained in this subject has much improved when compared with other SMDP RL algorithms.In the unmanned vehicle simulation experiment,the number of accidents in unmanned vehicles is zero according to the strategy of IVI and SSP RL algorithms.The application example of the UGV driving problem proves the validity of theory and the practicability of algorithm,Besides,this research extends the applicable scope of SMDP reinforcement learning algorithms.
Keywords/Search Tags:SMDP, average reward criteria, performance sensitivity, RL algorithm
PDF Full Text Request
Related items