As an important part of the Internet of Things technology,Wireless Sensor Networks technology has been widely used in various fields.However,the limited battery capacity of sensor nodes limits the operational lifetime of Wireless Sensor Networks.Once sensor nodes run out of energy,they will affect the monitoring quality of a certain area.Therefore,how to overcome this constraint and prolong the lifetime of the whole network is a hot issue for research.Currently,wireless charging is regarded as a promising solution to prolong the lifetime of networks.The traditional charging paradigm is single-node based.However,this one-to-one charging paradigm suffers from a lack of scalability and inefficient charging.Fortunately,multinode charging technology has emerged in recent years that can employ a mobile charger to simultaneously replenish multiple sensor nodes within the same charging range,thereby greatly increasing the charging capacity of mobile chargers.To simplify the calculation,most existing multi-node charging algorithms pre-restrict the sojourn locations of the mobile charger,e.g.,in a pre-partition and co-location manner.Different from existing research,this paper focuses on how to achieve efficient multi-node charging without pre-restricting any sojourn locations.To realize our objective,this paper introduces a new metric,charging reward,to measure the effectiveness of charging.Furthermore,this paper formulates the effective charging problem as a charging reward maximization problem.That is,the mobile charger with limited energy selects a series of sojourn locations and constructs a charging trajectory.By accumulating the reward of each sojourn location,the charging reward collected by the mobile charger after one charging cycle is maximized.Aiming at the high complexity of the uncertain number(even nearly infinite)of potential sojourn locations caused by no preset restrictions,and the generation of online charging strategies,this paper adopts a reinforcement learning algorithm to tackle the above problems.In the model of this paper,different from other charging strategies that use reinforcement learning technology,this paper not only considers a variety of basic state information of sensor nodes(e.g.,the residual lifetime and the relative position relationship of the nodes),but also considers the trajectory of the mobile charger information to guide the learning agent of the model in selecting the sojourn locations.Based on this,a multi-mode charging algorithm based on reinforcement learning(RLMC)is proposed to achieve the purpose of efficient charging.To further improve the efficiency of the charging algorithm and optimize the performance of the Wireless Sensor Network,this paper performs a decoupling operation on the charging action of the agent of the model learning to reduce the correlation between training samples in the reinforcement learning network,helping the network reduce the possibility of overestimation.In addition,each training sample sequence is given a priority in this paper.And a weighted sampling algorithm based on minimum heap is designed in this paper to select training samples for the network model according to the priority.Based on this,an efficient multi-mode charging algorithm(HWD-RLMC)based on Double Q of sampling improvement is proposed to improve the output of the model by combining the two technical methods with the basic RLMC algorithm.Finally,this paper conducts a large number of simulation experiments for each of the two charging algorithms for evaluating the performance of the proposed algorithms.The experiments are compared in three aspects: total charging reward,expiration durations,and number of death nodes,and the experimental results verify the advantages of the proposed algorithm in this paper.Among other things,this paper also analyzes the influence of parameters on network performance.Meanwhile,this paper presents a series of analyses of the experimental process and results to better illustrate the effectiveness of the charging scheme generated based on the reinforcement learning algorithm. |