Font Size: a A A

Research Of Adaptive Bitrate Algorithm Based On Deep Reinforcement Learning

Posted on:2023-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:L YiFull Text:PDF
GTID:2568306785964529Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recent years have seen the rapid growth of HTTP-based video streaming.At the same time,viewers’ demand for video quality is gradually increasing.The adaptive bitrate(ABR)algorithms are used by video players to improve the quality of experience(QoE)for users.Because of the problems of frequent buffering,video freezing,low image quality,or inaccurate prediction of network throughput in existing ABR algorithms,this thesis uses deep learning and reinforcement learning(RL)methods to focus on the efficiency of ABR algorithms.The main work is as follows:(1)Aiming at the problems of large fluctuations in reward value and slow and difficult convergence of the algorithm when training neural networks with existing RL methods,an ABR algorithm based on deep reinforcement learning(NABR)is proposed.At first,NABR limits the update range of the old and new policies to avoid convergence difficulties caused by the large difference in update range;secondly,NABR uses the baseline function to reduce the policy gradient variance;at the same time,the trust region method is used to find the optimal ABR policy;finally,NABR adds entropy loss to the policy network to encourage the agent to explore randomly to increase the cumulative reward.The experimental results show that,compared with the existing methods,NABR has a faster convergence speed,more robustness,and can further improve the user’s QoE.In addition,the effects of different neural network structures on the effects of the RL-based ABR algorithm are analyzed through experiments.(2)The existing RL methods require a large number of training samples and cannot converge quickly,resulting in weak generalization of the learned ABR algorithm and an inability to adapt to different network bandwidths.A large amount of policy gradient variance will be generated when calculating the policy gradient,causing convergence difficulties and other problems.A meta-learning-based ABR algorithm(LABR)is proposed.LABR uses the meta-learning method to train the RL policy network and uses a small number of samples to learn an optimal loss function,so LABR only needs a small number of task samples to quickly converge and be more efficient.stable,thereby improving the generalization of the ABR algorithm and further improving the QoE.Finally,the effectiveness of the LABR algorithm is verified by experiments.(3)For the existing ABR algorithms with fixed QoE parameters,RL generates the ABR algorithm model by training a fixed reward value,resulting in an increase in one index and a decrease in the other.For example,the weight coefficient of improving video quality will cause the video freeze time to increase;increasing the weight coefficient of the freeze time will reduce the video quality and other problems.An ABR algorithm(BABR)based on the constrained Bayesian optimization method is proposed.BABR uses the constrained Bayesian method to optimize the QoE.Various weights can improve the video quality and reduce the freezing time so that the video quality and the freezing time and other parameters can achieve the best combination.The experimental results show that,compared with the existing methods,BABR can achieve a better balance in the weights of various indicators of QoE and finally achieve a higher QoE.(4)Research the deployment and application methods of NABR,LABR,and BABR algorithms in adaptive streaming media systems.The NABR,LABR,and BABR algorithms are respectively deployed on the video player,and the ABR algorithm requests the video stored on the Linux server through the HTTP protocol to verify the validity of the algorithm.The experiments are evaluated in 4G and Wi Fi network environments,respectively.The experimental results show that the QoE metric of the NABR algorithm is improved by 3.8%–9.4%.
Keywords/Search Tags:adaptive bitrate algorithms, quality of experience, reinforcement learning, meta-learning, constrained bayesian
PDF Full Text Request
Related items