Research On Resource Allocation And Optimization Strategy Based On Multi-Armed Bandit In Wireless Networks

Posted on:2023-07-02

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J W Tong

Full Text:PDF

GTID:1528306623984249

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With the increasingly complex international environment and severe competitive pressure,the 6th Generation Mobile Networks（6G）development has become a national strategy for the world around.6G will introduce higher spectrum/energy/coverage efficiency,more robust security,new enabling technologies,and ubiquitous intelligence,making the network highly dense,heterogeneous,and intelligent.However,6G will also bring seveal challenges for the resource allocation and optimization problems in the wireless networks,such as high-density heterogeneous networks and reliable and low-latency applications.This thesis focuses on spatial reuse of full-duplex carrier sensing multiple access（FD-CSMA）networks,distributed resource allocation of heterogeneous cellular networks,device scheduling of high-density IoT networks,and link adaptation of underwater acoustic communication networks.To overcome these challenges,we put forth a series of allocation and optimization stategies by combining the communication theory,optimization theory,game theory,data-driven technology,and multi-armed bandit（MAB）technology,dedicated to realize the vision of nextgeneration wireless communication networks and to provide the theoretical and technical support.The main contributions of this thesis are summarized as follows.（1）To solve the spatial reuse problem in FD-CSMA network,this thesis proposes a stochastic and adversarial optimal（SAO）FD-CSMA algorithm to allocate the optimal transmit power（TP）,carrier sensing threshold（CST）,and logarithmic access intensity（LAI）to each FD link by combining with the optimization theory and MAB technolog.First,this spatial reuse problem is decomposed into a link scheduling problem in the MAC layer and a parameter selection problem in the physical layer based on the decomposition method.For the scheduling problem,this thesis designs an LAI adaptive algorithm using the Lagrange multiplier method and the subgradient descent method.For the parameter selection problem,we model it as a multi-player adversarial MAB framework and propose a SAO-based MAB algorithm to find the optimal TP and CST for each FD link.Then,the optimal TP,CST,and LAI can be determined on each FD link by alternately iteratively solving these two sub-problems.Finally,the theoretical result provides an upper regret bound for the proposed algorithm.In addition,the numerical results show that the proposed algorithm can improve the network throughput by 43%compared with the random selection method.（2）To handle the joint reconfigurable intelligent surface（RIS）and spreading factor（SF）allocation problem in the distributed heterogeneous networks,this thesis proposes an Exploration and Exploitation Boosting（E2Boost）algorithm by combining the non-cooperative game theory and the MAB technology.First,the joint selection problem is modeled as a two-stage Multi-Player MAB（MPMAB）problem using the MAB technique.The first MPMAB problem is to find the best RIS for each IoT device using the epsilon greedy algorithm and the non-cooperative game method.Then,the second MPMAB problem is to determine the best SF for each IoT device by using the Thompson sampling（TS）algorithm.Thereafter,the optimal RIS and SF can be obtained by alternately iteratively running the two-stage MPMAB problem.Finally,the theoretical result provides an upper performance bound for the E2Boost algorithm.In addition,the numerical results show that the performance of the proposed algorithm is improved by M times compared with the existing distributed allocation strategy,where M is the number of SFs of each IoT device.（3）To deal with the age-of-information（AoI）based scheduling problem in highdensity IoT networks,this thesis proposes a generalized Whittle Index-based scheduling strategy by combining the Markov decision process（MDP）and the MAB technology.First,this AoI-based minimization problem is formulated as a correlated restless MAB（CRMAB）by considering the correlation among IoT devices.Then,this CRMAB problem is decoupled into several one-dimensional subproblems by using the decomposition theory.In the stochastically identical channel model,this thesis derives the closed-form expression of the generalized Whittle index（GWI）by solving the Bellman equation of each sub-problem and proposes a GWI-based scheduling strategy.In the stochastically non-identical channel model,this thesis derives the closed-form expression of the generalized partial WI（GPWI）,and proposes a GPWI-based scheduling strategy.Finally,the theoretical result provides two lower performance bounds for the proposed GWI-and GPWI-based scheduling strategies.In addition,the simulation results show that the proposed scheduling strategies can significant outperform the existing AoI-based scheduling algorithms in high-density networks.（4）To attack the link adaptation problem in underwater acoustic communication,this thesis proposes a joint transmission frequency and rate selection strategy with fast convergence rate and low computational complexity by combining the model features and the MAB technology.We first model it as a single player MAB framework based on the online learning theory.For the stationary channel model,this thesis proposes a unimodal objective-based TS（UO-TS）algorithm by using the two-dimensional unimodal feature of the objective function.For the non-stationary channel model,this thesis proposes a hybrid change detection（HCD）based TS（HCD-UO-TS）algorithm to jointly track the slowly varying channel and detect the abruptly changing point.For large action space and lack of the unimodal feature,this thesis proposes an iterative boundary-shrinking（IBS）based TS（IBS-TS）algorithm based on the logistic regression-based action classification model.Finally,the theoretical result provides an upper regret bound for the UO-TS algorithm,and shows that its convergence rate is about log₂（MN/5）faster than the traditional MAB algorithms,where M and N are the number of the transmission frequencies and data rates,respectively.

Keywords/Search Tags:

Wireless communication network, resource allocation and optimization, multi-armed bandit, optimization theory

PDF Full Text Request

Related items

1	Small Base Station Resource Allocation Based On Multi-armed Bandit Theory
2	Optimization Method And Application Of Combinatorial Multi-Armed Bandit With Fairness Constraint
3	Study On Relay Selection Algorithm Based On Multi-Armed Bandit In Underwater Acoustic Cooperative Communication Networks
4	Software Test Resource Allocation Based On Adaptive Operator Selection
5	Auction Based Resource Allocation For Wireless Virtualization In 5G Networks
6	Study On Distributed Resource Allocation Optimization Algorithm For Wireless Sensor Network
7	Dynamic Spectrum Access In Wireless Networks
8	Research On Resource Management Mechanism Based On Multi-Armed Bandit In Edge Computing
9	Research On Resource Allocation Based On Reinforcement Learning In Multi-Beam Satellite Communications
10	Research On Channel Selection Mechanism Based On Multi-armed Bandit In Cognitive Network