Streaming video traffic accounts for the largest share of global Internet traffic.In order to achieve smooth video playback under various network conditions,the client video player uses an adaptive bitrate(ABR)algorithm to dynamically determine the bitrate of each video chunk.The goal of this algorithm is to adapt the video bitrate to the underlying network conditions to maximize the user’s quality of experience(QoE).In recent years,ABR algorithm based on reinforcement learning(RL)has been proposed and become the mainstream.However,the reward function used in the existing methods is often pre-set and lack of practical basis,or are not accurate enough,resulting in this kind of methods may provide users with a viewing experience that does not match their expectations.This paper proposes an ABR algorithm based on user trajectory preference,which aims to optimize the QoE of video users from user data.The main work and contributions are as follows:1.The existing video user QoE collection methods require users to give a score of the viewing experience after watching the video.However,it is not easy for users who lack adaptive video stream background knowledge to quantitatively give accurate score to describe their experience quality after a certain viewing,and there may be errors.Considering the above shortcomings,this paper proposes user trajectory preference and collects data,so that users can choose the better one after watching two different videos,without giving a quantitative score.2.A user QoE prediction model based on multi-layer perceptron and user trajectory preference is proposed.This paper describes the structure and training method of this model.After training with user trajectory preference data,the prediction result of this model is closer to the user’s real QoE than the existing reward function used by the RL-based ABR algorithm and the state-of-the-art learning-based QoE prediction method.3.By using the aforementioned QoE prediction model for deep RL algorithm training reward,an ABR algorithm based on user trajectory preference is proposed.This method avoids the blindness of reward modeling in RL,so that the ABR algorithm can be trained in the direction of meeting user needs.The experimental results show that,compared with the latest QoE function,the output of the proposed QoE prediction model has stronger correlation with the QoE data in the existing dataset.The accuracy of this model in user preference prediction is also about 13.6% higher on average.And it has good performance in different RL algorithms.Compared with other RL based ABR algorithms,user trajectory preference ABR algorithm can improve the average user QoE by about16.4%. |