The traditional routing algorithm establishes a mathematical model for the network environment to calculate the packet transmission path,which has the disadvantage of being difficult to expand the network structure and is not suitable for the modern network environment with high dynamics.Dynamic routing optimization requires routing agents to quickly perceive changes in network topology and network state,dynamically adjust their routing policies,and maintain good communication quality,which is of great significance to modern communication networks.Deep Reinforcement Learning(DRL)is widely used in the field of network communication as an effective method to solve dynamic environment problems.Deep reinforcement learning can learn the potential laws of the environment without the prior information of the environment,through the interaction with the system environment,guided by the preset return function,and efficiently solve complex problems in state and action space,which has the characteristics of simple deployment and rapid effect compared with mathematical methods.The application of deep reinforcement learning technology to dynamic routing optimization problems can enable routing devices to perceive network environment information in a targeted manner,intelligently select packet transmission paths,reduce packet transmission delay and packet loss rate,and improve network communication quality.In this thesis,we study the centralized dynamic route optimization problem and propose a spatiotemporal correlation dynamic route optimization algorithm based on Twin Delayed Deep Deterministic Policy Gradient(TD3).The algorithm designs a performance evaluation method that can be compatible with the performance difference between network topology and network infrastructure equipment,uses the spatial correlation characteristics of convolutional neural network sampling network topology,uses the time correlation characteristics of recurrent neural network to sample network traffic data,and designs an attention mechanism to identify the more important parts of the features that are more important to performance.Simulation results show that the improved algorithm is better than the ordinary TD3 algorithm,the centralized deep Q-routing algorithm and the congestioncontrolled confidence Q-routing algorithm(Congestion Credence Q-Routing,C2Q-routing),which can effectively reduce the packet loss rate and average communication delay.In this thesis,we study the problem of distributed dynamic route optimization and propose a multi-agent dynamic route optimization algorithm based on federated learning.In the context of distributed routing,the traffic data passing through the network is often nonpublic,and the algorithm uses federated learning to share training results between agents,avoiding the privacy and security risks caused by the exchange of data samples between agents.Simulation results show that the proposed algorithm is better than the MADDPG algorithm,multi-Actor-Attention-Critic(MAAC)and the deep Q routing algorithm of independent strategy in terms of packet loss rate and average communication delay,which improves the overall network communication efficiency. |