| With the rapid development of wireless communication technology worldwide,wireless communication has become one of the main ways for people to convey information.In recent years,for real-time applications based on Internet of Things(Io T)systems,scholars have proposed the concept of age of information(Ao I)to accurately describe the freshness of the information collected by Io T devices at the front end,which refers to the time taken for self-generated source node information to be successfully transmitted to the destination node.This concept has quickly attracted widespread attention from academia and industry.Intelligent reflecting surface(IRS)assisted communication technology reconstructs the wireless channel environment by utilizing the amplitude and phase shift of received incident signals.With the advantages of low deployment cost and low power consumption,this technology can significantly improve the performance of wireless communication systems.Simultaneous wireless information and power transfer(SWIPT)communication technology,as an important component of radio frequency(RF)energy harvesting,can achieve parallel information and energy supply to wireless devices,effectively addressing the energy-limited node problem.Wireless relay communication technology can extend wireless communication distance at low cost,improve service quality,and ensure communication for end-users.Combining IRS assisted communication technology,SWIPT technology,and wireless relay communication technology at the wireless relay,RF energy can be used to power the integrated IRS,enabling flexible and low-cost deployment of IRS and wireless relay to expand the range of wireless communication.This is a new approach to improve the performance of wireless communication systems.Therefore,in the thesis,we conduct in-depth research on the design and optimization of the RF energy-assisted IRS-enabled wireless relay system,with the aim of minimizing the long-term average Ao I.Firstly,a multi-mode energy harvesting and information forwarding protocol was designed to increase the energy collected by the IRS assisted relay and efficiently utilize the harvested energy.Under this protocol,the system can choose to operate in one of the three modes: direct transmission mode,hybrid mode of energy harvesting and information forwarding,and energy harvesting mode.Secondly,based on the newly designed protocol,the optimization problem of the system was established,aiming to minimize the long-term average Ao I from the source node to the destination node by jointly optimizing the system mode selection and IRS reflection phase while ensuring that the energy meets the data packet forwarding requirements and the data packets can be transmitted in the channel.Since the optimization problem created is a difficult mixed integer programming problem,the corresponding optimization problem was solved using deep reinforcement learning(DRL)methods.To achieve this,the optimization problem was first transformed into a finite-state Markov channel model.As the state space of the obtained Markov decision process is very large,traditional reinforcement learning methods(such as Q-learning)cannot be used for the solution.Therefore,a corresponding system optimization algorithm was proposed based on DRL methods.Finally,the performance of the proposed system optimization algorithm was validated through computer simulations.The simulation results show that compared to the direct transmission scheme without any wireless relay and IRS,as well as the relay transmission scheme without IRS,the system optimization algorithm proposed in the thesis can significantly reduce the long-term average Ao I of the system.Secondly,although the learning model obtained from the above training using DRL methods can well guide the agent to operate in a large state space,there are certain limitations,i.e.the action space cannot be too large,otherwise the convergence of the learning of the interaction between the agent and the environment cannot be guaranteed.Therefore,a novel two-stage hierarchical reinforcement learning(HRL)framework is further proposed,which uses two deep Q network(DQN)to learn the three working mode decisions of the agent and the action decisions under mixed energy harvesting and information forwarding modes,respectively,and reduces the action space that each DQN needs to learn in order to improve the learning efficiency and speed up the convergence of the learning training.To this end,the optimization problem is first decomposed into two subtasks,outer action learning and inner action learning,and then the corresponding system optimization algorithm is investigated based on a HRL approach.Finally,the performance of the proposed system optimization algorithm is verified by computer simulations.The simulation results show that the HRL algorithm can achieve better convergence performance compared to the DRL algorithm. |