With the development of society and the advancement of technology,air conditioning has become indispensable in people’s daily life and work.However,the heavy use of air conditioners has led to an increase in energy consumption,and due to the increasing complexity of air conditioning systems,various fault problems have followed,and the loss of energy consumption has also become more and more serious.Therefore,research on air conditioning system fault diagnosis has been the focus of energy saving.However,due to the large number of subsystems and the complicated structure of the air conditioning system,at the same time,due to the existence of experience and subjectivity,it is quite difficult to directly set the reward for the fault diagno sis problem of the air conditioning system,which also leads to the sparseness of rewards to a certain extent,making the algorithm’s learning efficiency inefficient and difficult to converge,and the inverse reinforcement learning algorithm can solve the problem of difficult to set rewards by learning from expert examples.With the rise and continuous development of artificial intelligence technology,more and more artificial intelligence technology is used in the field of construction,and it has well reflected its application value.The combination of inverse reinforcement learning and air conditioning system fault diagnosis is also in line with the development trend of building intelligence.In this paper,the inverse reinforcement learning method is applied to the fault diagnosis problem of the air conditioning system,and it is studied.For the problem of sparse expert sample data,using generative adversarial network training to generate more expert sample data,which effectively solves the problem of sparse rewards for inverse reinforcement learning.The improved maximum entropy inverse reinforcement learning algorithm based on generative adversarial network is used for air conditioner system parameter state prediction and fault diagnosis.In view of the fact that there are many parameter states of the air-conditioning system,it is relatively expensive to restore each task one by one,lifelong learning is introduced to carry out the transfer of the reward function between tasks,so that multiple air conditioner system faults can be diagnosed.The main content includes the following three parts:(1)Aiming at the problem of slow learning rate caused b y sparse expert samples,a maximum entropy inverse reinforcement learning algorithm based on generative adversarial networks is proposed.In the learning process,combined with expert sample training to optimize the generation of adversarial networks,gene rate virtual expert samples,on the basis of which random policies are used to generate non-expert samples,construct a mixed sample set.Combining the maximum entropy probability model to model the reward function,and use gradient descent method to solve the optimal reward function.Based on the solved optimal reward function,a forward reinforcement learning method is used to solve the optimal policy,and on this basis,non-expert samples are further generated,a mixed sample set is reconstructed,and th e optimal reward function is iteratively solved.The proposed algorithm and Max Ent IRL algorithm are applied to the classic Object World and Mountain Car problems.Experiments show that the algorithm can solve the reward function better and has better conv ergence performance when the expert samples are sparse.(2)The GAN-MaxEnt IRL algorithm proposed in Part(1)is used for air conditioning system fault prediction and diagnosis,the initial state of the air conditioning system is used as the input of the algorithm,and the reward function for optimal state prediction is solved by the algorithm.Based on the reward function solved,the forward reinforcement learning TD algorithm is used to predict the state,and the state value function corresponding to the predicted state is compared with the set threshold to realize the diagnosis of the air conditioning system fault.The simulation results show that this method can effectively diagnose air conditioning faults,well reduce the cost loss and energy consumption loss caused by air conditioning faults,creating economic benefits and achieve building energy saving.(3)In view of the complexity of the air conditioning system and the large number of fault types,the reward function for the optimal prediction of each parameter state one by one has a large cost of recovery,the idea of lifelong learning is introduced.By learning multiple tasks,construct a reward function model that can approximately represent more tasks.And through continuous learning and training,constantly revise and perfect the reward function,so that it can be applied to more related tasks,more fault prediction and diagnosis can be carried out. |