| In recent years,Reinforcement Learning(RL),as an important algorithm in machine learning,has been paid more attention because of its ability to learn from experience.Especially,with the rapid development of deep learning,Deep Reinforcement Learning(DRL)emerges as the times require,and shows a strong application potential.Although DRL has been successful in many dy-namic networks,DRL still faces some problems and challenges in the practical applications of dynamic networks,such as the dynamic tracking problem in UAV-assisted communication network and the high dimensional dynamic fleet management problem in ride-sharing platforms.In view of the above problems,this dissertation designs and proposes DRL schemes to solve the actual dynamic network problems and further improves the dynamic network performance.The main contributions and innovations of this dissertation are as follows:(1)For the dynamic UAV-assisted communication network,this disser-tation proposes a dynamic UAV trajectory design scheme based on the con-strained DRL algorithm,and the downlink transmission rate of the communi-cation network is greatly improved.Specifically,the dynamic three-dimension trajectory design problem of UAVs under the coverage constraint is described as a constrained Markov decision process problem,and a constrained DRL al-gorithm is proposed to solve the problem.The purpose of the proposed DRL algorithm is to ensure that all Ground Terminals(GTs)are covered and improve the transmission rate of the communication network.In order to satisfy the cov-erage constraint,the primary variables and dual variables are trained in order by the primal-dual method.In addition,in order to reduce the action space of the proposed DRL algorithm,this dissertation proposes an action filter mecha-nism to eliminate invalid actions by using the prior information.Experimental results show that according to the mobile strategies obtained by the proposed DRL algorithm,UAVs can track the random roaming GTs under the premise of ensuring coverage,and improve the downlink transmission rate of the commu-nication network.(2)For the dynamic vehicle dispatching network,this dissertation proposes a dynamic multi-vehicle dispatching scheme based on the rewriting DRL algo-rithm,and achieves the effect of improving the Order Response Rate(ORR)of ride-sharing platforms.The proposed rewriting DRL algorithm includes DRL module and rewriting module.The DRL module makes dynamic vehicle dis-patching strategies based on the captured dynamic changes of traffic supply and demand,and takes the Kullback-Leibler(KL)distance between traffic supply distribution and traffic demand distribution as incentive feedback.The rewrit-ing module learns how to use the simplified and effective Q-table in RL to improve the vehicle dispatching strategies made by the DRL module.In order to test the performance of the proposed dynamic vehicle dispatching algorithm,a traffic simulator is designed to train and test the proposed algorithm.Ex-perimental results show that the proposed algorithm can improve the ORR by 3.86%compared with the existing vehicle dispatching methods.(3)For the dynamic order dispatching network,this dissertation proposes a dynamic order dispatching scheme based on multi-objective reward learning algorithm,and improves the Average Driver Income(ADI)and ORR.In the dynamic order dispatching scheme,a multi-objective reward parameterized Q-learning(MRPQ-Learning)algorithm is proposed to measure the value of each driver-order pair,which is defined as the value of the driver serving a specific order.Then,all drivers and orders are matched by the centralized matching al-gorithm to maximize the values of all driver-order pairs.In addition,in order to realize the diversification of traffic supply,virtual orders generated according to the predicted traffic demand are dispatched to the idle drivers.It is worth not-ing that the multi-objective reward in the MRPQ-Learning algorithm takes into account both immediate reward(order price and receiving distance)and future revenue(future traffic demand of order destination).In order to further promote the development of ride-sharing platform ecology,this dissertation proposes a"outstanding driver good reward" mechanism to encourage drivers to improve online time and service quality.Experimental results show that,compared with the existing order dispatching methods,the proposed method can improve the ORR and ADI by 8.57%and 13.86%respectively. |