Font Size: a A A

Research On Multi-armed Bandit Aided Online Learning Approach For Wireless Caching Strategy

Posted on:2021-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:T ChenFull Text:PDF
GTID:2428330623968173Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the increasing capabilities of the computing and storage of smart terminals,a large number of new services are emerging,resulting in rapid growth of mobile data traffic in which video services are the main source,which brings huge challenges to the existing wireless communication network.In order to meet the ever-increasing traffic demand,the researchers proposed a new type of edge network architecture based on the characteristics of popular content repeatedly requested in a short time: that is,the hit content can be stored in the small base station(sBS)close to the user in advance.When a user needs to request content,if sBS has cached the content,it can directly serve the user.The novel architecture can not only relieve the pressure on the existing core network,but also reduce user's request delay.In recent years,the design of caching strategies has become one of the research hotspots.In the context of unknown content popularity,the thesis studies the design of caching strategies for sBS in different scenarios.Using multi-armed bandit(MAB)theory as a mathematical tool,our work is shown as follow:First,the caching strategy design of single-target popular content is studied.Under the condition of unknown content popularity,we construct the sBS's cache problem as a combinatorial multi-armed bandit model(Combinatorial-MAB,CMAB): we compare the sBS to the decision maker(the bandit),and the popular content to the arms.the demand for popular content is equipment to the arm's rewards,and the process of the decision maker pulling arms is likened to the behavior of sBS caching popular content.The algorithm is designed according to the greedy strategy and the upper confidence strategy(Upper Confidence Bound,UCB),and we verify that the regret of the algorithm is sub-linear.Secondly,the caching strategy design of multi-objective popular content is studied.On the condition of unknown content popularity,the payment is considered as the other metric of paid content,then the problem of multi-objective content caching is proposed.We constructed the multi-objective caching problem as a multi-objective multi-armed bandit model(Multi-Objective-MAB,MO-MAB),and proposed two multi-objective online learning algorithms,based on the linear weighting and the Pareto principle,respectively.The simulation results show that the performance of the two multi-objective algorithms is superior to the existing algorithms,and the performance of both algorithms proves that the regrets of both algorithms are sub-linear.Finally,the caching strategy design of the set of multi-objective popular content is studied.Based on the multi-object caching conditions above,the caching problem of the set of content is considered,then we constructed a combinatorial multi-object multi-armed bandit model(Combinatorial MO-MAB,CMO-MAB).We proposed an online learning algorithm for multi-objective proactive caching based on the extra Pareto principle,and prove that the regret of the proposed algorithm is sub-linear.Simulation results show that the propsed algorithm can obtain higher cumulative rewards than other existing algorithms,which is more suitable for complex multi-objective communication environment.
Keywords/Search Tags:wireless caching, content popular profile, online learning, multi-armed bandit(MAB)
PDF Full Text Request
Related items