| The maintenance optimization of multi-unit systems is an important area of system reliability engineering.The dimension of state and action spaces of practical maintenance optimization problems is often large due to the dependence among different units and the relationship between the maintenance and other activities.The large state and action spaces make the maintenance optimization problems difficult to solve.In this thesis,multi-agent reinforcement learning is used to solve the maintenance optimization problem of multicomponent systems in large state and action space with different characteristics.First of all,coordinated reinforcement learning(CRL)is used to optimize the maintenance strategy of series system.Unlike traditional multi-agent reinforcement learning,the agents in CRL make decisions collaboratively,which improves the convergence of the algorithm and improves the quality of the maintenance strategy.In this thesis,CRL is used for the first time to solve the maintenance optimization problem of multi-component systems.The state and action variables of each agent are determined by drawing the coordination graph,and the collaborative relationship between the agents is also constructed.The global Q function is decomposed using the edge-based decomposition method,and then each agent is trained by the off policy method.The results show that the strategies determined by CRL are more costeffective than strategies determined by double deep q network(DDQN).On the basis of CRL,hierarchical coordinated reinforcement learning(HCRL)is proposed to solve the maintenance optimization problem of multi-component system with hierarchical structure.In a multi-component system with a hierarchical structure,the importance of each component is significantly different,and the relationship between components is more complex.Therefore,the collaboration efficiency between agents in the CRL algorithm decreases.HCRL introduces the hierarchical decision-making mechanism of agents on the basis of CRL,thereby improving the collaborative efficiency of agents.According to the structural importance measures of each component,this study divides it into different levels.Only the agents within the same level cooperate with each other,thus reducing the coordination complexity.This thesis also studies the abstraction of actions and states of higher-level agents.The results show that the strategies determined by HCRL are more cost-effective than strategies determined by existing methods.Finally,genetic reinforcement learning(GRL)is used to optimize the maintenance strategy of the manufacturing system with intermediate buffer.Compared with the systems studied in the first two parts,the existence of intermediate buffer reduces the coupling relationship between components and weakens the advantages of cooperative decision-making of agents.The GRL used in this thesis is a combination of genetic algorithm and multi-agent reinforcement learning.In GRL,the two algorithms promote each other.In order to realize the communication between genetic algorithm and reinforcement learning,this thesis proposes a transition algorithm from state-based strategy to threshold-based strategy.Finally,a complex manufacturing system consisting of disassembly and assembly units is taken as an example to verify the performance of GRL.The results show that GRL outperforms both genetic algorithm and reinforcement learning.The above research shows that multi-agent reinforcement learning can better integrate the knowledge of maintenance optimization and the characteristics of multi-component systems compared with deep reinforcement learning that has emerged in recent years.When solving the three typical maintenance optimization problems in this study,there is a faster convergence rate and a better maintenance strategy can be obtained. |