| Complex optimization problems widely exist in practical production processes such as engineering design,metallurgy,and pharmaceuticals.As a typical optimization problem,job scheduling performs the characteristics of multiple constraints,multi-objective,non-linearity,and non-differentiability.It is difficult to solve the complex job scheduling problem in practical production due to the limitation of the scale of production resources and the uncertainty of production processes.Most job scheduling problems have been proven to be the non-deterministic polynomial problems.Under the background of economic globalization,distributed manufacturing has become an unstoppable trend due to the fault tolerance,scalability,and processing speed,and the study of distributed scheduling problems have attracted the attention of many scholars.With global warming and environmental degradation,environmental protection and the improvement of energy utilization efficiency have attracted increasing attention.Green scheduling improves economic benefits while achieving energy conservation and emission reduction through rational allocation of resources and planning plans.This study investigates the flow-shop scheduling problems considering different constraints under distributed context.We focus on the energy consumption indicators of production and processing while optimizing time and economic indicators.It is very complex to solve the distributed scheduling problems with multiple constraints and objectives,and the optimization of scheduling methods is a hot research topic currently.Different from traditional optimization methods,meta-heuristic algorithms have the characteristics of fast solving speed and problem independence,and are widely used to solve production scheduling problems.The reinforcement learning mechanism is utilize to guide meta-heuristic algorithm for solving complex continuous optimization problems and distributed green production scheduling problems.The contributions of thesis are as follows: at the problem level,the existing distributed flow-shop scheduling models are extent from different constraints and optimization objectives;at the algorithmic level,the combination of reinforcement learning mechanism and meta-heuristic algorithms is investigated to solve different optimization problems;at the application level,the scheduling models are abstracted from different scenarios in the continuous casting process of the actual steelmaking industry.The main research content and work of this article are as follows:(1)This chapter proposes a co-evolutionary migrating birds optimization algorithm based on policy gradient(CMBO-PG)for solving complex continuous optimization problems.In CMBO-PG,the long-short term memory(LSTM)networks is utilized to as strategy selectors to guide the selection of different mutation strategies.The policy gradient(PG)method of reinforcement learning is used as a parameter optimizer to update the LSTM network to dynamically adjust the selection probabilities of different strategies.For the shortcomings of the original migrating birds optimization(MBO)algorithm,the dual-population co-evolution mechanism was designed to enhance the local search ability through communication and interaction between populations.The proposed algorithm has been tested on the CEC2017 benchmark suite.The experimental results show that the performance of CMBO-PG is superior to 12 algorithms in solving complex continuous problems,and the multi-strategy selection mechanism based on reinforcement learning improved the performance of the algorithm.(2)This chapter proposes a cooperative meta-heuristic algorithm based on Qlearning(CMAQ)for solving the energy-efficient distributed no-wait flow-shop scheduling problem with sequence dependent-setup time(EDNWFSP-SDST).In this chapter,the mixed-integer linear programming(MILP)model of minimizing the maximum completion time()and total energy consumption(TEC)is constructed.The MILP model is abstracted from the steelmaking and continuous casting stage in the steelmaking process.A bi-population cooperative framework based on double Qlearning was designed to guide meta-heuristic algorithms in selecting operators.According to the properties of energy-efficient DNWFSP-SDST,an energy-saving strategy based on knowledge is proposed to improve and TEC.The performance of CMAQ was compared with 4 state-of-the-art algorithms on two different benchmark suite.The simulation results verified the superiority of the proposed algorithm and the effectiveness of Q-learning mechanism.In response to the computational resource and time consumption issues of reinforcement learning mechanisms,this chapter compares the performance and time consumption of CMAQ with CMAQ using random selection strategies and conducts a detailed analysis.The experimental results show that the learning mechanism based on Q-learning consumes less computational resources and obtains significant performance improvement.(3)This chapter proposes a meta-heuristic algorithm with policy gradient(MHAPG)for solving the energy-aware distributed no-wait flow-shop scheduling problem in the heterogeneous factory system(EDNWFSP-HFS)to minimization of total tardiness(TTD)and TEC.Compared to the EDNWFSP-SDST problem in(2),EDNWFSP-HFS additionally considers delivery time constraints and heterogeneous factory constraints.The MILP model of EDNWFSP-HFS provides a more detailed description of the continuous casting stage.In MHA-PG,PG is used to guide the selection of operators.Compared to the Q-learning mechanism,PG output the selection probabilities of all operators end-to-end,which enhance the stability.Through a large number of comparative experiments,the results show that MHA-PG outperforms 3 state-of-the-art algorithms in solving EDNWFSP-HFS. |