Font Size: a A A

Research On Resource Management Using Deep Reinforcement Learning In B5G Communication Networks

Posted on:2023-05-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y WangFull Text:PDF
GTID:1528306914976629Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the global commercialization and deployment of the fifth generation(5G)communication networks,research on the beyond fifth generation(B5G)future communication networks has also been widely carried out.The B5G communication network will evolve towards higher throughput,lower latency,higher reliability,larger number of connections,and higher spectral efficiency.The key to restricting these goals is the rational allocation and scheduling of communication resources.The B5G communication network is committed to creating a highly digital,intelligent and global data-driven intelligent information society,striving to build an autonomous ecosystem with similar human intelligence and consciousness,hence the intelligent technique-based selfinnovation is an essential ability for the B5G communication network.The deep reinforcement learning(DRL)-based resource management technique has received great attention and development because of the flexibility,effectiveness and intelligence of DRL in resource management.This dissertation aims at the typical application scenarios of B5G communication networks,takes DRL as the core technique,and starts from the three key requirements characteristics of high mobility,high spectral efficiency,and high personalization of B5G communication networks.Three typical resource management problems in layer,physical layer and network layer are solved:1)Fast time-varying channel-based dynamic multiple channel access(DMCA)problem in data link layer;2)Joint user pairing,subcarrier scheduling and power allocation problems for multicarrier non-orthogonal multiple access(MC-NOMA)system in physical layer;and 3)Joint virtual network function(VNF)placement and routing problems for the network function virtualization(NFV)-based network in the network layer.The three research problems are from shallow to deep,the optimization objectives are from simple to complex,the techniques used are from easy to difficult.This dissertation fully considers the differences in user personalization in different problems,takes throughput,delay,and cost as optimization goals,respectively designs DRL-based solution farmeworks adapted to them,and verifies the effectiveness of the proposed schemes from multiple dimensions.The main contributions and innovations of this dissertation are summarized as follows.1.This dissertation proposes a learning-based DMCA framework to solve the DMCA problem with fast time-varying channels.For refining the differences in user requirements,this dissertation first considers two traffic models and designs a psychology-based personalized quality of service model.Then,two optimization problems of maximizing throughput are formulated respectively based on the two traffic models.For addressing these two optimization problems,this dissertation proposes a novel prediction-based deep deterministic policy gradient(P-DDPG)algorithm by utilizing the DDPG algorithm.Further,the learning-based DMCA framework is designed,which consists of two modules:a channel prediction module(CPM)responsible for the real-time reliable prediction of channel gain,and a P-DDPG module responsible for the output of the DMCA policy.In particular,the CPM module employs the incremental learning and the long-term memory unit techniques.This dissertation also proposes the concept of "virtual user" to decouple the dependence of the neural network structure on the number of users,thus ensuring the stability and transferability of the proposed framework in dynamic environments.Experiments verify the policy output by the learning-based DMCA framework can maintain a high stable throughput for a period of time in the future,thus effectively alleviating the influence of non-immediate decision errors on the DMCA policy in fast time-varying channels.2.This dissertation proposes the DRL-based joint resource management(DRL-JRM)framework to solve the joint subcarrier allocation(including the user pairing)and power allocation problem in the MC-NOAM system.This dissertation first considers a more realistic downlink power domain MC-NOMA system,where the imperfect successive interference cancellation and power disparity and sensitivity constraint are studied.Then,a joint optimization of subcarrier allocation and power allocation is formulated with the optimization objective of maximizing the throughput of the weighted system.In this optimization problem,multiple users of different numbers can be multiplexed on each subcarrier,and each user can also select a different number of subcarriers.In addition,this dissertation also designs the weight factors to depict the difference in user demand priority.For this problem,this dissertation presents the basic process of transforming multivariate optimization problems into RL tasks,and proposes a new reward mechanisms:joint reward and internal reward mechanism.On this basis,this dissertation designs a new DRL-JRM framework to solve the optimization problem,which composes of two alternating DRL modules,applying single-agent DRL and multi-agent DRL(MADRL)techniques respectively.In view of the characteristics of this optimization problem,this dissertation makes specific improvements to the input layer,hidden layer and output layer of the neural network in the DRL-JRM framework,further improving the effectiveness and convergence of the DRL-JRM framework.Experiments verify the DRL-JRM framework designed in this dissertation can achieve the higher system throughput and the stronger anti-interference,and can also flexibly meet the individual demands of users.3.This dissertation proposes the multi-agent DRL-based placement and routing(MADRL-P&R)framework to solve the placement and routing policies of VNFs in NFV-based networks.This dissertation first introduces the VNF placement and VNF routing problems in detail,and considers four types of constraints and three types of network resources.Then,the joint optimization of VNF placement and VNF routing is formulated with the optimization objective of minimizing deployment cost and service delay at the same time.The formulated problem is a multi-variable and multi-objective optimization problem,in which delay-sensitive and cost-sensitive factors are also proposed to characterize the differentiated demands of users.To solve this optimization problem,this dissertation decouples it into two iterative subtasks,each of which is transformed into multiple parallel sequential decision-making processes.On this basis,this dissertation designs a new MADRL-P&R to solve this problem by applying the MADRL techniques.In addition,for the MADRL-P&R failure problem caused by network topology changes(addition and deletion of nodes and/or links),this dissertation also proposes a parameter transfer-based model retraining method.In a word,the research results of resource management scheme design in the above research problems can better improve the future technical scheme and optimization algorithm of B5G communication.The framework and design proposed in this dissertation are highly scalable,and the ideas and methods for converting multivariate,multiobjective and other joint optimization problems to RL tasks can also be transferred to other DRL-based resource management researches.
Keywords/Search Tags:Fast Time-Varying Channel Access, Multi-Carrier Non-Orth ogonal Multiple Access, Network Function Virtualization, Joint Resource Management, Deep Reinforcement Learning, Multi-Agent Technique
PDF Full Text Request
Related items