| Satellite control technology is very important,especially satellite power control technology,which determines the efficiency of satellite movement and indirectly affects the efficiency of satellite mission execution.Space technology is one of the most complex technologies in all current research.Its exploration of space resources and study of strategic space deployment has been an essential part of our policy.Traditional automatic control methods require precise measurement of the satellite’s position,velocity,and other parameters and then building complex mathematical models for tuning the parameters,which are usually time-consuming,highly delayed and unstable,and cannot realize real-time control,which brings excellent potential danger to the safety of the satellite.Among the many emerging algorithms,reinforcement learning can learn the optimal control strategy by interacting with the environment without knowing the system model.Compared with traditional automatic control methods,reinforcement learning is more adaptive and robust,capable of adapting to complex,nonlinear systems and learning the optimal control strategy adaptively.In this context,this study aims to investigate how reinforcement learning can control satellites to achieve their target missions.This research will focus on applying reinforcement learning algorithms in satellite control and explore its advantages and potential.The specific work of this study is as follows:First,the motion model of a satellite in the inertial coordinate system,satellite space exploration problem,and optimal satellite propulsion control problem is modeled and solved using Twin Delayed Deep Deterministic Policy Gradient(TD3)algorithm in this paper.Meanwhile,in order to solve the problems of insufficient exploration capability and low training capability in the vast state space of the TD3 algorithm,this study improves the algorithm by adding compound noise that includes improving the exploration rate of the enormous space,the Running Mean Std normalization method that solves the excessive differences in data dimensions,and the state prediction based on LSTM networks,and proposes the Advanced-TD3 algorithm,thus realizing the satellite motion control for the target point exploration mission,and finally constructing a satellite propulsion control system based on deep reinforcement learning.The feasibility of the improved algorithm proposed in this study is verified through experimental comparison.Secondly,this paper models the satellite motion model in orbital six-root number coordinates,the control problem under the restricted pulse frequency,and the orbiting strategy problem.This study reconstructs the Markov process based on the satellite propulsion control system based on deep reinforcement learning.It adds a composite artificial potential field module as an additional bootstrap reward in the Advanced-TD3 algorithm,thus improving the algorithm’s performance under complex constraints.Meanwhile,in this study,two different deorbit control strategies are designed for the control problem under satellite-restricted pulse frequency: the exploration mode-based satellite deorbit control strategy and the target navigation-based satellite deorbits control strategy to control the satellite to reach the target orbit.Through experimental comparison,it is verified that both orbiting strategies can accomplish the mission objectives under the improved Advanced-TD3 algorithm. |