Font Size: a A A

Identification And Internal Model Control Of Fractional Order Systems

Posted on:2017-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:L T LiFull Text:PDF
GTID:2348330491461751Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Policy evaluation and learning control are the two major issues in reinforcement learning and it also provides a strong foundation for solving the problem of learning control. Among the numerous policy evaluation algorithms, temporal difference learning is the most popular method in reinforcement learning. Regularization is a way to incorporate prior knowledge into the objective function, and is also an effective method to prevent the value function approximator from over-fitting to data. Through the choice of the basis functions, this method results in sparse solutions. As a result, the structure of the approximator is simplified, and the generalization ability of the approximator is improved. Incremental technology is a method that can significantly reduce the complexity of the algorithm, which is based on the premise that it does not affect the utilization of the sample. At present, these two methods have been applied in the classical time domain difference algorithm. However, the research on some algorithms which have been proposed recently is still to be studied.On the basis of previous research, this paper further studies the problem of regularization in the least-square temporal difference learning algorithm and the reduction of the complexity of the algorithm.Firstly, we proposed an algorithm named least-square temporal difference learning with eligibility traces based on regularized extreme learning machine (ELM) to overcome the problem caused by the random initialization of ELM parameters. This method can effectively reduce the influence of the random initialization of ELM, and can better approximate the true value of the value function compare with other policy evaluation algorithms.Then, in order to solve the l1 regularization problem in the least square temporal difference with gradient corrections, a least square temporal difference learning based on the least angle regression is proposed. The algorithm can produce sparse solution of policy evaluation problem, and reduce the function space effectively and avoids over-fitting problems.Lastly, in order to solve the problem of high computational complexity of the least square temporal difference learning with gradient corrections, an incremental technique is introduced to solve the problem, and the resulting algorithm is named as incremental least-square temporal difference learning with gradient corrections. The algorithm combines the advantages of low complexity in first-order algorithm and high sample utilization of the least square based algorithm, which has a better ability to solve practical problems.
Keywords/Search Tags:reinforcement learning, policy evaluation, regularization, incremental least-square temporal difference learning, extreme learning machine
PDF Full Text Request
Related items