| With the vigorous development of micro satellites and their technologies,fixedstructure satellites have been difficult to meet the requirements of multi-task execution capabilities,strong environmental adaptability and anti-risk requirements put forward by various countries,so people have turned their attention to having on-orbit variable structures The ability of modular reconfigurable satellites.The on-orbit auto-constitutive planning method of modular reconfigurable satellites has become an important research direction.In order to describe the topological motion model of automorphic satellites,the concept of configuration is proposed,and three description methods such as matrix description,set description,graph theory description and so on are defined.Assuming that the motion of the automorphic satellite module satisfies the motion rules of the cubic rotation module model,the topological motion law of the module is derived from the physical motion rules.At the same time,the importance of the connectivity of the configuration to the normal operation of the automorphic satellite is proposed,and the method of judging the connectivity is given.In order to realize the simultaneous movement of multiple modules,the function of conflict resolution is added to the self-destructive programming.The conflict resolution is implemented by the method of network evolutionary game.The definition includes the action space of all possible actions of the module,constructs the physical engine to simulate the actual movement of the module,and provides the basis for the interaction between the module and the environment in model-free reinforcement learning.The Markov decision model of the auto-constitutive process is established,the state and state space of the module are defined,and the instantaneous return and Q function of the action are defined according to the distance function between the auto-constitutive satellite and the target configuration.Design the Q learning method of distributed on-orbit auto-constrained programming,and optimize the strategy in the iterative process of Q function.Aiming at the problem of memory overflow caused by the state of a large number of storage modules,a deep neural network is designed to fit the Q function.Network training is used to realize the iterative process of Q function,and then a distributed self-destructive programming method based on deep reinforcement learning is obtained.Design simulation experiments verify the effectiveness of the above method to solve the problem of auto-constitutive planning. |