Research On Multi-level Inverted Pendulum Balance Control Based On Deep Reinforcement Learning

Posted on:2024-07-19

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Liu

Full Text:PDF

GTID:2530307097456984

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

As a commonly used benchmark model in control theory and robotics,the inverted pendulum has the characteristics of nonlinearity,underactuation,and absolute instability,and has been widely used in various fields such as transportation and manufacturing.In the control of multi-stage inverted pendulum,the traditional method can keep the balance under certain disturbance,but requires precise mathematical model.Compared with traditional methods,deep reinforcement learning algorithms have powerful decision-making ability,can overcome the limitations of traditional methods,and only rely on environment and agent state information for decision-making control.Based on this,this paper studies the balance control problem of multi-stage inverted pendulum,and proposes two balance control algorithm based on deep reinforcement learning.Automatic entropy flexible action evaluation multi-stage inverted pendulum balance control algorithm,the main research contents are as follows:(1)Aiming at the problem of long training time of deep reinforcement learning,a multi-stage inverted pendulum balance algorithm.for flexible action evaluation based on off-policy sampling is proposed.Specifically,the off-policy sampling technique is used to speed up the training process,and a reward function with a safety margin is designed to guide the inverted pendulum to quickly reach and maintain a balanced state.In order to verify the effectiveness of the SAC multi-stage inverted pendulum balancing algorithm based on off-policy sampling,a comparison experiment with the flexible motion evaluation algorithm was carried out in this paper.Simulation results show that the flexible action evaluation algorithm based on off-policy sampling shows faster reward convergence speed and more stable training process on the second-level and third-level inverted pendulums,and the second-level and third-level inverted pendulums are respectively in 10%and 3%random disturbance can maintain a balanced state.(2)Aiming at the problems of existing deep reinforcement learning algorithms,such as value function fitting errors,and difficulty in balancing exploration and utilization,an automatic entropy SAC multi-stage inverted pendulum balance control algorithm based on an integrated framework is proposed.Specifically,an ensemble framework is first used to reweight the Bellman operator to reduce the cumulative error.Secondly,the entropy coefficient is introduced into the network input,and the upper bound algorithm of the confidence interval is used to balance the exploration and utilization.In order to verify the effectiveness of the proposed ensemble MetaSAC algorithm,an ablation experiment is designed in this paper to compare and analyze the impact of ensemble framework and automation entropy on the performance of deep reinforcement learning.In addition,the proposed algorithm is compared with SAC-v2.The simulation results show that the integrated MetaSAC algorithm shows a more stable training process on the three-stage inverted pendulum,and the two-stage and three-stage inverted pendulums can still maintain a balanced state under 10%and 5%random disturbances,respectively.

Keywords/Search Tags:

multi-stage inverted pendulum, deep reinforcement learning, Markov decision process, SAC algorithm, continuous motion control

PDF Full Text Request

Related items

1	State Estimation And Policy Learning In Partially Observable Markov Decision Processes
2	Partial Observation Of Memory-based Reinforcement Learning Problems In Markov Decision Process
3	Research On Intelligent Decision Model Based On Deep Reinforcement Learning
4	Generation Of Adaptive Decision-making Ability Of Agents Based On Deep Reinforcement Learning
5	Research On Action Control And Decision Based On Reinforcement Learning
6	Application Of Markov Decision Process In Wireless Caching Networks
7	Research On Autonomous Driving Human-like Car-following Decision Algorithm Based On Deep Reinforcement Learning
8	Research On Cell Migration Process Based On Deep Reinforcement Learning And The Algorithm Improvement
9	Optimal Control Of Delayed Systems And Its Applications To Wheeled Inverted Pendulum
10	Research On AUV Motion Planning Method Based On Maximum Entropy Deep Reinforcement Learning