| As a commonly used benchmark model in control theory and robotics,the inverted pendulum has the characteristics of nonlinearity,underactuation,and absolute instability,and has been widely used in various fields such as transportation and manufacturing.In the control of multi-stage inverted pendulum,the traditional method can keep the balance under certain disturbance,but requires precise mathematical model.Compared with traditional methods,deep reinforcement learning algorithms have powerful decision-making ability,can overcome the limitations of traditional methods,and only rely on environment and agent state information for decision-making control.Based on this,this paper studies the balance control problem of multi-stage inverted pendulum,and proposes two balance control algorithm based on deep reinforcement learning.Automatic entropy flexible action evaluation multi-stage inverted pendulum balance control algorithm,the main research contents are as follows:(1)Aiming at the problem of long training time of deep reinforcement learning,a multi-stage inverted pendulum balance algorithm.for flexible action evaluation based on off-policy sampling is proposed.Specifically,the off-policy sampling technique is used to speed up the training process,and a reward function with a safety margin is designed to guide the inverted pendulum to quickly reach and maintain a balanced state.In order to verify the effectiveness of the SAC multi-stage inverted pendulum balancing algorithm based on off-policy sampling,a comparison experiment with the flexible motion evaluation algorithm was carried out in this paper.Simulation results show that the flexible action evaluation algorithm based on off-policy sampling shows faster reward convergence speed and more stable training process on the second-level and third-level inverted pendulums,and the second-level and third-level inverted pendulums are respectively in 10%and 3%random disturbance can maintain a balanced state.(2)Aiming at the problems of existing deep reinforcement learning algorithms,such as value function fitting errors,and difficulty in balancing exploration and utilization,an automatic entropy SAC multi-stage inverted pendulum balance control algorithm based on an integrated framework is proposed.Specifically,an ensemble framework is first used to reweight the Bellman operator to reduce the cumulative error.Secondly,the entropy coefficient is introduced into the network input,and the upper bound algorithm of the confidence interval is used to balance the exploration and utilization.In order to verify the effectiveness of the proposed ensemble MetaSAC algorithm,an ablation experiment is designed in this paper to compare and analyze the impact of ensemble framework and automation entropy on the performance of deep reinforcement learning.In addition,the proposed algorithm is compared with SAC-v2.The simulation results show that the integrated MetaSAC algorithm shows a more stable training process on the three-stage inverted pendulum,and the two-stage and three-stage inverted pendulums can still maintain a balanced state under 10%and 5%random disturbances,respectively. |