Font Size: a A A

Sample Efficiency Improvement Method Of Deep Reinforcement Learning And Its Application In Video Bitrate Control

Posted on:2022-05-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:D J YangFull Text:PDF
GTID:1488306323464194Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Combined with deep learning,deep reinforcement learning(DRL)is one of the hot research directions in artificial intelligence,and has been lots of successful cases in many fields.However,DRL generally faces the problem of sample complexity,espe-cially model-free DRL,which is more universal and practical.Due to its inherent re-quirements such as incremental parameter adjustment,the learning of algorithm requires a large number of samples,which poses a great challenge to the practical application of the algorithm,especially in the scene where sample acquisition is costly.In order to solve the problem of sample complexity of model-free DRL algorithm,this dissertation explores to design key core algorithms to improve sample efficiency under the framework of episodic reinforcement learning,and studies how to further apply it to the specific application of video bitrate control.The three main research contents of the dissertation are as follows:(1)Aiming at the limitations of existing episodic memory models in terms of rapid-ity and stability,an improved episodic memory(EM)model is proposed,which integrates fast reward propagation mechanism,table-based Q-learning and N-step learning.This model owns the advantages of rapid reward propagation as well as stable update,and reduces the storage and computational complexity of the model through reasonable design.Aiming at the problem of single use of EM under the framework of episodic DRL,a comprehensive use framework of multiple EM model for DRL is proposed.By applying the proposed model to the core algorithm design of three key parts(exploration strategy,experience replay and loss function)that affect the adjustment of DRL parameters,the sample effi-ciency of DRL is improved.Experimental results show that the DRL algorithm based on the multiple EM model does have a significant role in improving the sample efficiency of DRL.(2)Aiming at the problems that the traditional and multiple EM based DRL are not able to make full use of the association information between samples,and the latter's EM model can't evaluate the new state,this dissertation proposes a sample efficiency enhancement algorithm based on directed associative graph(DAG)by introducing DAG to represent the overall association relationship be-tween all episodes.Through planning based on directed edges on the DAG,the algorithm forms another learning system to estimate the state-action value func-tion.The target deep neural network,EM and planning based on DAG are three learning systems which are used to generate state-action value functions to esti-mate the target value of DRL from different perspectives.Finally,the algorithm uses these target value estimates as supervised values to calculate the loss signals separately,which are used together to update the parameters of the deep neural network.Experiments show that the sample efficiency enhancement algorithm based on DAG can further improve the sample efficiency of the DRL compared with the algorithm based on the multiple EM model proposed in(1).(3)The video rate control(Adaptive Bitrate,ABR)algorithm under the framework of DRL has particularly prominent requirements for sample efficiency.Aim-ing at the characteristics of two stages of initial buffer and formal play of video session,this dissertation proposes an ABR algorithm based on episodic sub-memory DRL.The algorithm divides the video session into two sub-rounds:initial buffer and formal play.These two sub-rounds are then modeled and opti-mized separately,and finally unfied in the reward function model that considers the initial buffer.With the goal of maximizing the user's Quality of Experience(QoE),the training is suitable for the ABR algorithm that considers sample ef-ficiency,algorithm performance,and initial buffer at the same time.The sim-ulation experiment results show that compared with the ABR algorithm based on traditional DRL and the direct application of the research content(1)and(2),the algorithm effectively improves the sample efficiency in training on the one hand,and on the other hand,it also comprehensively improves the user's subjec-tive QoE and the objective quality index of the video,including the performance in the initial buffer stage.The research content of this dissertation uses a combination of the standard data set(Atari 2600 game)which are commonly used in academia and the simulation data set for specific applications of video rate control,which better proves the universality of the sample efficiency improvement algorithm as well as the scalability for practical application.The research work of this dissertation provides a useful reference for the design of sample efficiency improvement algorithm under DRL,and provides a refer-ence direction for the practical application and promotion of the algorithm.
Keywords/Search Tags:Artificial Intelligence, Reinforcement Learning, Deep Reinforcement Learning, Sample Efficiency, Episodic Memory, Directed Associative Graph, Video Bitrate Control, Adaptive Bitrate
PDF Full Text Request
Related items