Font Size: a A A

Inverse Reinforcement Learning Based On Sample Evaluation

Posted on:2018-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2428330623450704Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Intelligent era is approaching,intelligent technology is having a tremendous impact on all aspects of life,in order to better respond to changes in the form of the future war,we must attach importance to the use of learning methods.Reinforcement learning is the basic method to solve the problem of interaction with the environment at present,but its effect depends strongly on the reward function.Inverse reinforcement learning can predict the possible reward function of the system through the excellent demonstration of expert samples.However,the current learning sample of inverse reinforcement learning is limited to the excellent demonstration of experts,giving up directly for the general effect sample and the negative sample.This article is devoted to studying the inverse reinforcement learning algorithm based on sample evaluation,making full use of the value of samples.Drawing lessons from positive examples and learning from negative samples.This paper mainly carried out two aspects of the work:1: We modified the maximum entropy inverse reinforcement learning framework so that it can process two positive and negative class evaluation samples simultaneously.The classical invese reinforcement learning method can only deal with excellent sample demonstrations given by experts,but is not utilized for those less than ideal samples.A few modifications of the framework of maximum entropy inverse reinforcement learning algorithm are made to make it capable of handling both positive and negative samples.The reward function we expect is one that can make choices close to positive samples and away from negative ones given by the experts.When a set of candidate reward functions are found,the one with the largest entropy is chosen.2: We expand the previous word and extend the evaluation of demonstrative quality levels from tow categories to multiple categories.A set of basic reward functions can be obtained after a pairwise comparison of the demonstration samples under a comprehensive assessment index.The demonstration samples are then reevaluated using the functions generated in the previous step.The boosting algorithm is applied to combine the reward functions so that all the sample will be evaluated by the combined function in the same way as the experts.The main contribution of this thesis is to enlarge the range of learning samples for inverse reinforcement learning algorithms.It is a consensus that the inverse reinforcement learning algorithms requires learning from optimal samples to achieve ideal results.However,it is hard to decide whether a choice is optimal in battlefield mission plan.The comparison of two given choices is much easier.The proposed algorithm makes the inverse reinforcement learning algorithm less dependent on the quality of the samples,and makes the samples suitable for more circumstances.This is crucial to bringing the inverse reinforcement algorithm from theory to battlefield application.
Keywords/Search Tags:Inverse Reinforcement Learning, Sample Evaluation, Learning from Failure, Boosting, Reinforcement Learning
PDF Full Text Request
Related items