Font Size: a A A

Study On Optimal Control In Communication Network Based On Reinforcement Learning

Posted on:2005-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y B ZhangFull Text:PDF
GTID:2168360152468050Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
During the past decades, the vast increment in Internet and mobile application, especially the development of real-time multimedia applications has brought forward diverse QoS (Quality of Service) requirements to communication networks. To maximize the network performance, optimal control policy is always necessary to be studied.This thesis concentrates on the application of Markov decision process (MDP) and reinforcement learning (RL) in communication networks control, and investigates how to provide optimal policy in some wireline and wireless network control environment. The concepts of Markov decision process and reinforcement learning are introduced firstly. Then based on the definition of utility function in network, dynamic admission control issue is formulated as a discreted MDP and solved by Q-Learning algorithm. To overcome the "curse of dimension" problem, measured available bandwidth is used as the state space descriptor rather than number of flows as in most previous works. An adaptive method is also proposed to split the continuous bandwidth into discrete sets. This scheme is called measurement-based dynamic admission control (MBDAC). Simulation demonstrates that MBDAC can improve the total long-term utility of the network while meeting certain QoS constraints simultaneously.Next, we extend reinforcement learning to a continuous-state continuous-action context. A new learning algorithm with function approximation is proposed; its convergence is also proved. Based on this algorithm, we describe a new Active Queue Management scheme, Reinforcement Learning Gradient-Descent (RLGD). The key idea is to adjust the congestion measure and the performance measure separately and adaptively, without the demand of knowing the rate adjustment scheme of the source sender. The performance is evaluated and compared with REM and PI controller using NS-2 simulator. The result shows that RLGD scheme is much more responsive and robust to disturbance under various network conditions. RL method is also adopted in power control of Ad-hoc networks. Simulation demonstrated that power control can improve the whole network throughput significantly and the performance of our scheme approximates to that of the optimal algorithm.At last, the thesis tends to makes some efforts on the combination of information theory and queuing theory. It seems an alternate and promising way to solve communication network QoS problems. Some primary results are presented.
Keywords/Search Tags:Communication Network, Optimal Control, Markov Decision Process, Reinforcement Learning, Dynamic Admission Control, Active Queue Management, Power Control, Information Theory, Queuing Theory
PDF Full Text Request
Related items