Font Size: a A A

Resource Management Research Based On Markov Decision Processes In Wireless Networks

Posted on:2020-04-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q Z LiFull Text:PDF
GTID:1488306473484734Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of the mobile Internet and the Internet of things,the number of wireless communication devices is exploding.Massive wireless communication devices not only generate huge energy consumption but also have great demands on the bandwidth resources and computing resources of wireless communication networks.However,the bandwidth resources of wireless communication networks are limited.Simply increasing the investment of energy and computing resources is not only difficult to meet the growing demand for wireless communication devices but also increases operating costs and wastes resources.Reasonable resource management methods can greatly reduce resource overhead in the premise of meeting the wireless communication equipment service requirements.Resource management methods can be divided into management methods at time instants and long-term management methods.The resource management methods at the time instants do not consider the impact of the current decision on future decisions,and thus cannot generate the resource scheduling gains over time.A class of long-term resource management methods is to assume that the environmental information that influences decision-making is known for a period of time,and some static optimization methods are used to solve the optimal decision of each step.Since the wireless communication environment is random,the assumption that the environmental information is known at a future time cannot be realized in practice.Another class of long-term resource management methods assumes that environmental information is random,but information at different time instants obeys independent and identical distribution.The decision strategy under this assumption does not consider the correlation of wireless communication environment information over time.We use the Markov decision processes(MDP)to model the resource management problem in wireless communications.It is assumed that the environmental information of adjacent decision-making epochs has Markovity and fully considers the correlation of environmental information over time.Therefore,the MDP are effective for modeling stochastic sequential decision problems.In this paper,we use MDP or the semi-Markov decision process(SMDP)to model the resource management problem of radio-frequency(RF)energy harvesting communication,renewable energy harvesting communication,and cloud-fog computing system.We use modelbased planning algorithms or model-free reinforcement learning algorithms to solve the optimal resource management strategy.Firstly,we study the energy allocation problem of low-power sensors powered by the dedicated RF energy source.Due to the severe fading of electromagnetic waves in the wireless channel,the dedicated RF energy source can provide sufficient RF energy at close range.We consider two common sensor working modes,i.e.,frequency division multiplexing mode and time division multiplexing mode.Since the RF energy transmission link is short and has a line of sight path,we assume that the RF energy transmission channel obeys the Rice distribution.We assume that the information transmission channel only has scattering links and thus model it as a Rayleigh fading channel.We discretize the power gains of the two channels and model them as finite state Markov chains,respectively.We use the discrete-time infinite-horizon discounted MDP to describe the energy harvesting and information transmission process,and use the value iterative algorithm to search the asymptotically optimal policy.In the asymptotically optimal policy of the frequency division multiplexing mode,we theoretically prove that the sensor's transmission energy is monotonously non-decreasing as the energy state of the battery increases.We use this monotonicity to simplify the search range of the asymptotically optimal policy and thus reduce the complexity of the iterative algorithm.In contrast,time division multiplexing mode does not have this monotonicity.Our simulation verifies the correctness of the conclusion and gives the reason for this result.Secondly,we study the solar-assisted heterogeneous network and propose a downlink packet scheduling strategy based on the SMDP.We analyze the factors affecting the solar radiation intensity and use the continuous-time Markov chain(CTMC)to model the solar radiation intensity change process.We use the Poisson processes to model the downlink packet arrival processes.We derive the transition probability and discounted transition probability of the composite decision-making state of battery state,solar radiation intensity and event.We solve the asymptotically optimal packet scheduling policy by using the relative value iterative algorithm under the average criterion and the value iterative algorithm under the discounted criterion.Then,we use the semi-Markov decision process to model the virtual machine(VM)allocation problem of the cloud-fog computing system,and use the model-based planning algorithm and the model-free reinforcement learning algorithm to solve the asymptotically optimal VM allocation policy.Before executing the planning algorithm,it is necessary to train the state transition probability and the expected time interval between the adjacent decision epochs.For generic SMDP,it is quite difficult to train the model,especially when the system has a large state space or action space.To facilitate training the model,we degenerate the SMDP into a continuous time Markov decision process(CTMDP).In this case,we only need to train the service request arrival rate and the service completion rate to derive the state transition probability and the expected time interval of the adjacent decision epochs.In addition,we also propose a model-free reinforcement learning method where an optimal coordinated VM allocation policy is approximated by learning from the states and rewards of feedback.The simulation results show that the performance of the model-free reinforcement learning method can converge to a level similar to that of the model-based planning method.Finally,we use the constrained SMDP to model the joint allocation problem of VM and wireless bandwidth under the time delay constraints in wireless access cloud-fog computing system.We use a multi-timescale actor-critic reinforcement learning algorithm to update policy parameters,value function parameters,and Lagrange multipliers,so as to continuously improve resource allocation policy under constraints.
Keywords/Search Tags:Markov decision processes, semi-Markov decision processes, resource management, RF energy harvesting communications, renewable energy harvesting communications, cloud-fog computing system, delay constraints, model-based planning algorithm
PDF Full Text Request
Related items