While fuel vehicles promote economic development and increase production efficiency,they also bring environmental pollution and energy crisis issues.In September 2020,China put forward the "peak carbon dioxide emissions" and "carbon neutrality" goals in the United Nations General Assembly,which put forward new requirements for the automobile industry,and the development of new energy vehicles has become a consensus of all walks of life.Fuel cell vehicles solve the range anxiety problem of pure electric vehicles and make up for the shortcoming of gasoline-electric hybrid vehicles that still emit pollutants,and have been called the "ultimate solution" of transportation.In order to improve the dynamic response of the system and recover the braking energy,fuel cell vehicles generally use the hybrid structure of fuel cell +battery,among which the lifetime of fuel cell is one of the main challenges restricting the development of fuel cell vehicles.Therefore,the use of energy management strategies to coordinate the power output of different power sources to meet the power requirements of the entire vehicle,while achieving energy saving and emission reduction,and extending the lifetime of the fuel cell is of great significance to the promotion of fuel cell vehicles.Common energy management strategies can be divided into rule-based and optimization-based strategies.The rule-based strategy has good real-time performance,but the control effect is general.The optimization-based strategy has outstanding effect,but its real-time performance is not good.With the development of artificial intelligence technology,energy management strategies based on intelligent algorithms such as reinforcement learning and deep reinforcement learning have become a research hotspot in the field of energy management because they take into account optimality and real-time performance.In addition,the existing energy management strategies mainly focus on improving the fuel economy and preventing the battery from over charging and over discharging,ignoring the impact of the fuel cell lifetime on the cost of the vehicle.Therefore,the paper designs three energy management strategies based on reinforcement learning for a fuel cell hybrid vehicle: the Q learning based energy management strategy,the deep Q learning-based energy management strategy,and the deep deterministic strategy gradient-based energy management strategy.The dynamic programming algorithm is used to calculate the optimal fuel economy under each driving cycle,which is used as the benchmark for the comparison of each strategy.Through the prior knowledge of the fuel cell system efficiency curve,the fuel cell works in the high efficiency range.According to the fuel cell attenuation rate model,in the reward signal of the algorithm,factors such as fast load changing,idle speed,start and stop conditions are considered to reduce the attenuation rate,which is helpful to improve the lifetime of the fuel cell.The proposed strategies are trained offline on UDDS,WLTC and Japan1015 and applied online to NEDC.The proposed strategies are evaluated from multiple dimensions,such as algorithm stability,fuel economy,fuel cell life time enhancement effect and operating conditions adaptability.In the Q learning-based energy management strategy,the required power change is taken as a Markov process,the probability transfer matrix is calculated,the algorithm is pre-initialized by some rules,and the environment is fully explored by ?-greedy algorithm.The results show that the pre-initialized Q learning algorithm can converge more stably and quickly.The difference between the strategy and the optimal fuel economy is about 10%,and has a certain enhancement effect on the fuel cell lifetime.The results of online application show that the proposed strategy can adapt to different driving cycles.The Q-learning algorithm is suitable for solving the problem of low state dimension with low computational power and good convergence.However,with the increase of vehicle state variables,it becomes difficult to calculate the value function corresponding to discrete state action.Using neural network approximation function,the deep Q-learning based energy management strategy is designed,which can effectively deal with energy management problems in high-dimensional continuous state.Through experience replay,the correlation between samples is reduced.The priority experience replay technique is used to increase the sampling probability of high importance samples and improve the computational efficiency of the algorithm.In the aspect of network,using the method of target value network,two consistent networks are designed to achieve delayed update.The simulation results show that the fuel economy of the proposed strategy is improved by about 4% compared with the strategy based on Q-Learning and can smooth the power fluctuation of the fuel cell and reduce the times of the fuel cell start and stop.The results of online application show that the strategy also has better control performance under unknown driving cycles.Compared with Q-learning,the control effect of the deep Q-learning based energy management strategy is greatly improved,and the algorithm has good convergence.However,the deep Q-learning algorithm needs to calculate the action to maximize the Q value,which can not avoid the discretization error of the action.For this reason,an energy management strategy based on deep deterministic policy gradient is designed to achieve continuous control.The target value network is used to solve the problem of self-dependence when the network is updated.Experience replay,priority experience replay are used to reduce sample correlation and improve sampling efficiency.By adding random noise,the relationship between exploration and utilization is fully balanced.The results show that the convergence speed and stability of the proposed algorithm are significantly improved compared with the deep Q learning algorithm.In terms of fuel economy,the proposed strategy is about 3% different from the optimal one,and it can adapt to unknown driving cycles and has outstanding control performance while significantly reducing the power fluctuation.This research explores the application of reinforcement learning algorithms in the field of fuel cell vehicle energy management.The results show that the proposed strategy has good performance in terms of algorithm convergence,fuel economy,fuel cell lifetime enhancement and working condition adaptability,and can be further extended to practical applications. |