Reinforcement Learning Algorithms For Semi-markov Decision Processes

Posted on:2019-08-21

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Yang

Full Text:PDF

GTID:2428330566998675

Subject:Control engineering

Abstract/Summary:

As the main algorithm in machine learning,reinforcement learning(RL)plays an important role in the fields of intelligent decision-making and behavior cognition.It is also an effective way to solve the stochastic sequential decision problem.Although the RL algorithms have made notable achievements in basic theory researches and applications,most researches use Markov decision processes(MDP)as the system's environmental model.However,many systematic environmental models can not be accurately described by MDP in practical applications.Semi-Markov decision processes(SMDP),which provides an effective model for describing the time factor,can effectively overcome these shortcomings and have extensive research prospects.Aim at improving the deficiency of current RL algorihtms in MDP,a unified analytical framework is proposed to extend RL algorithms from MDP to SMDP.Under average reward criteria,the framework derives the continuous time-type and discrete time-type Bellman optimality equations through the performance sensitivity analysis method.By using the iterative form of Berman optimality equation,SMDP RL algorithm's state action pair Q-value updating formula is obtained.Simulation results demonstrate the convergence of SMDP RL algorithms developed in this dissertation,and also verify the validity of the analytical framework.In addition,combining with incremental value iteration(IVI)algorithm and stochastic shortest path(SSP)value iteration algorithm,two new SMDP RL algorithms are obtained in this dissertation.Besides,the dichotomy method is introduced to obtain the different SMDP RL algorithms.The framework proposed in this dissertation can make the extension of SMDP RL algorithms simple and intuitive,and it also has a positive guiding effect on the research of RL algorithms.Through simulation experiments,we verify the convergence of algorithm and the effectiveness of strategy.Simulation results show that the convergent speed of algorithm obtained in this subject has much improved when compared with other SMDP RL algorithms.In the unmanned vehicle simulation experiment,the number of accidents in unmanned vehicles is zero according to the strategy of IVI and SSP RL algorithms.The application example of the UGV driving problem proves the validity of theory and the practicability of algorithm,Besides,this research extends the applicable scope of SMDP reinforcement learning algorithms.

Keywords/Search Tags:

SMDP, average reward criteria, performance sensitivity, RL algorithm

Related items

1	Information retrieval performance enhancement using the average standard estimator and the multi-criteria decision weighted set of performance measures
2	Study On The Improved Average Reward Reinforcement Learning Algorithm Based On Performance Potentials
3	Inverse Reinforcement Learning Under Average Reward Criterion
4	Research On Reward Optimization In Reinforcement Learning
5	Research And Implementation Of Sparse Reward Algorithm Based On Reinforcement Learning For Virtual Shooting Scenes
6	Research And Application Of Deep Reinforcenment Learning Algorithms Based On Reward Shaping
7	Towards Design Of Intrinsic Rewards For Sparse Reward Problem
8	The Design And Realization Of Intelligent Network Health Assessment System
9	Performance Potential-based NDP Optimization Approaches And Application Research For SMDP
10	Asynchronous Optimization Algorithms For SMDP Based On Performance Potential