Font Size: a A A

Adaptive Bitrate Method For Streaming Media Based On Deep Reinforcement Learning

Posted on:2024-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y N FengFull Text:PDF
GTID:2568306944459574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and communication technology,the number of mobile devices is continuously increasing.The traffic generated by video service increases significantly and has become the main source of Internet traffic.At the same time,users’ demand for video service quality has increased significantly.In order to adapt to the dynamic changing network environment and provide high quality of experience for users,the client uses the adaptive bitrate algorithm to dynamically select the bitrate of each video chunk by taking the estimated value of network throughput and buffer duration as input.However,the inaccuracy of bandwidth prediction caused by the variability of network throughput,the contradictory relationship between the optimization objectives of user quality experience and the cascade effect caused by the bitrate decision all bring new challenges to adaptive bitrate streaming.In recent years,deep reinforcement learning has been applied to solve the adaptive bitrate steaming problem.This kind of method uses deep neural network to process the historical transmission information,which can optimize the policy according to the characteristics of the network,effectively improve the transmission efficiency and average bitrate,reduce rebuffering time and quality switching,and improve QoE to some extent.However,there are some limitations to the existing research.Firstly,taking QoE with fixed parameters as the target for training can achieve good performance under the original parameters,but when the parameters are changed,the algorithm performance is poor and generalization is not strong.Secondly,the reward function as the optimization objective has complicated factors that conflict with each other,so it is difficult to adjust the policy flexibly for multiple indexes,which leads to the decline of the gradient update performance.Thirdly,QoE optimization model used in existing studies is composed of objective QoS indicators,which cannot reflect users’ perceived quality to the greatest extent.And the impact of dynamic changes of buffer on video quality has not been fully considered.In view of the above background and challenges,this paper carries out researches on the bitrate adaptive method based on learning intrinsic reward and the quality perception bitrate adaptive method oriented to QoE model optimization,including:(1)The bitrate adaptive method based on learning intrinsic reward.Firstly,the transmission model of the video adaptive stream is defined,and the optimization target is defined as the user QoE composed of average bitrate,rebuffering time and smoothness.Secondly,in view of the degradation of gradient updating performance caused by complex and contradictory task objectives,an intrinsic reward module is designed to enhance the learning and understanding of environmental states of agents,so as to make decisions more forward-looking.On this basis,a bitrate adaptive method based on learning intrinsic reward is proposed to solve the task model.The results show that this method can achieve higher QoE.(2)Quality perception bitrate adaptive method oriented to QoE model optimization.Firstly,QoE optimization model is reconstructed by combining QoS derived objective indicators with video quality representation model.Secondly,the weight of rebuffering penalty item in the optimization model is defined as a variable.A dynamic weight switching method in the training process is proposed aiming at this variable to drive the change of the bitrate decision direction by the change of rebuffering penalty item weight.The reconstructed QoE optimization model is used for training.The comparison and evaluation verified that this method can improve users’ QoE based on quality perception and have a positive impact on various underlying indicators.
Keywords/Search Tags:adaptive bitrate streaming, quality of experience, deep reinforcement learning, quality perception, intrinsic reward
PDF Full Text Request
Related items