Font Size: a A A

Offline Reinforcement Learning Algorithms And Their Applications On Industrial Control

Posted on:2022-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:H R XuFull Text:PDF
GTID:2518306602992979Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reinforcement Learning(RL)has achieved great success insolving complex tasks,including games and robotics.However,most RL algorithms learn good policies only after millions of trials and errors in simulation environments.Although off-policy RL algorithms are considered able to leverage any data to learn skills,past work has shown that they cannot be directly applied to static datasets,due to the extrapolation error of the Q-function caused by out-of-distribution actions,and the error can not be eliminated without requiring growing batch of online samples.Offline Reinforcement Learning,also known as batch RL or data-driven RL,aims at solving abovementioned problems by learning effective policies solely from offline static data,without any additional online interactions.Offline RL algorithms can be categoried to modelfree and model-based methods.Model-free methods tackle this problem by ensuring that the learned policy stays ”close” to the behavior policy via behavior regularization.This is achieved by adding a regularization term that calculates some divergence metrics between the learned policy and the behavior policy.Model-based methods adopt a pessimistic MDP framework,where the reward is penalized if the learned dynamic model cannot make an accurate prediction.However,current model-free and model-based methods are not sufficiently good to be deployed.Model-free methods use exact behavior regularization,which could be overly restrictive.Model-based methods depend largely on the quality of learned dynamic models,while ignoring information from the offline data.Besides,both model-free and model-based methods ignore the challenges induced in the real-world applications,including safety constraint satisfaction,modeling large state stochastic systems,etc.In this work,we thus propose simple yet effective improvements to address these problems.In model-free methods,we proposed an improved model-free offline RL framework,SBQ.SBQ relaxes the divergence when the state has broad action support,resulting in a soft version of behavior regularization,it has the full flexibility to choose actions within the support while still prevent out of distribution actions.In model-based methods,we proposed a new model-based offline RL framework,called MORE,MORE is trained by combining real historical data as well as carefully filtered and processed simulation data through a novel restrictive exploration scheme.Our experimental results show that SBQ and MORE matches or outperforms the state-of-the-art approaches on the offine RL benchmark datasets.SBQ and MORE have already been successfully deployed in two large coal-fired thermal power plants in China.Real-world experiments show that the optimized control strategies provided by our algorithms effectively improve the combustion efficiency of thermal power units while reducing the amount of pollutant emissions.
Keywords/Search Tags:Reinforcement Learning, Offline Reinforcement Learning, Reinforcement Learning Applications, Complex Industrial Control, Combustion Optimization of Thermal Power Units
PDF Full Text Request
Related items