| The growing number of imaging satellites and increasing capability of their platform and payload have enabled satellites to achieve wider applications and generate greater societal benefits,while new challenges are brought to imaging satellite task planning:Refined control leads to more variables,increased decision dimension,and larger solu-tion space of the task planning problem,and then higher quality algorithms are required?normalized quick response reflects the gradual enhancement of the timeliness of imaging products,and then higher efficiency and stability of algorithms are needed? complicated constraints cause the intricate relationship between variables in the problem,and then a model which unifies access and integrates satellites is demanded.It is precisely because of the aforementioned changes and requirements,the contradictions(versatility vs high efficiency,solution efficiency vs solution accuracy)are increasingly acute in the process of imaging satellite task planning.To alleviate these two pairs of contradictions,the the-sis focuses on integrating deterministic algorithms with reinforcement learning for solving imaging satellite task planning problems.The research work is carried out in the following five aspects:(1)A bi-level optimization model for imaging satellite task planning is proposed.Considering the satellite operations process,imaging satellite task planning problem is described.Based on the analysis of the elements of the problem,a decomposition scheme of the problem and the basic form of the bi-level optimization model are proposed: The matches of tasks and resources are determined during task allocation process,while the execution time of each task is decided during task scheduling process.This model is fundamental to analyzing and solving the problem,which provides a theoretical basis for realizing unified modeling on different satellites and scenarios.(2)A solution framework integrating deterministic algorithms and reinforcement learning is proposed.According to the characteristics of the bi-level optimization model,a solution framework combining reinforcement learning and deterministic algorithms is designed to fully utilize the advantages of reinforcement learning and deterministic al-gorithms: Deterministic algorithms achieve a stably and satisfactory solution to the task scheduling problem within a polynomial time complexity? Reinforcement learning algo-rithms train the empirical formula of the allocation process under complex constraints by simulation scenarios,to replace manual designed of allocation rules.The continuous in-teraction of these two processes realizes the training process of reinforcement learning.There is no need to specifically discuss the analytical nature of the problem and prepare labeled data,to achieve the portability of this approach.(3)Two deterministic algorithms for solving task scheduling problems are proposed.The task scheduling problem can be considered as a mathematical programming model,in which the constraints are divided into four categories and standardized described.A constraint checking algorithm based on the timeline is designed.A heuristic(HADRT)algorithm based on residual task density and a dynamic programming(DP)algorithm based on task sequencing are presented.The determinism of the algorithm guarantees the stability of the solution? polynomial time complexity guarantees the computational efficiency of the solving process? the optimality under certain conditions of HADRT and DP can be proved theoretically,which guarantees the quality of the results.In addition,these two algorithms are decoupled from the constraints,which improves the versatility of the algorithm in imaging satellite task scheduling problems.Experiments verify that HADRT has advantages in time efficiency and operational stability,while its solution quality is at the same level of three advanced comparison algorithms? the DP gets a higher task completion rate and task return rate than other algorithms in an acceptable time?especially in oversubscribe scenarios,the task completion rate and the task return rate are26% and 19% higher than that of ALNS,respectively.(4)Deep Q-learning algorithm is improved to solve task allocation problem.Firstly,the finite Markov decision process(MDP)model of the problem is established.Consid-ering the characteristics of the task assignment problem such as numerous input parame-ters,complex relationships,and low information density,the action space and state space are minimized while ensuring their completeness.The short-term reward and the value function in the MDP are designed in accordance with the domain knowledge,which alle-viates the low training efficiency caused by sparse reward in the training process.Based on this model,an improved deep Q learning algorithm(DQN)is designed,which con-tains a framework oriented to random initial states and an action pruning strategy based on domain knowledge,enabling the algorithm to achieve high-efficiency training.The experiment discusses the ablation of the algorithm attributes at first,and then the perfor-mance of the integrated algorithm is excavated.Through the analysis of the algorithm training and testing process,the feasibility of DQN in solving the task allocation problem and the superiority of the integrated DQN and deterministic algorithm in the task planning problem are proved.(5)Theoretical results in this thesis have been successfully verified by a real-world application project,which is ”Super View-1”,a commercial remote sensing satellite con-stellation of China.Firstly,”Super View-1” task planning system was designed? the in-ternal and external interface design ensured the rationality of the system.Secondly,a bi-level optimization model and two integrated planning algorithms for the ”Super View-1”satellite constellation are put forward to planning tasks submitted to ”Super View-1”.The design of the system,model,and algorithms follow the current operation and control pro-cess and industry specifications,with which practical applications can be well integrated.The results from 14 sets of real ”Super View-1” daily task planning scenarios show that the solution accuracy of the two integrated algorithms proposed in this work(DQN_DP and DQN_HADRT)can get higher profits in all scenarios than the comparing algorithms.It can solve the most experimental scenarios with limited computing resources,proving DQN_DP and DQN_HADRT have higher computational efficiency.When it comes to future complex application scenarios,the proposed integrated algorithms have huge ad-vantages and potentials. |