Font Size: a A A

Optimized dynamic vehicle routing policies with applications

Posted on:2013-06-21Degree:Ph.DType:Thesis
University:Boston UniversityCandidate:Lin, YingweiFull Text:PDF
GTID:2458390008984408Subject:Engineering
Abstract/Summary:
This dissertation addresses two applications: (a) optimizing dynamic vehicle routing policies in warehouse forklift dispatching, and (b) reward collection by a group of air vehicles in a 3-dimensional mission space. For the first application, we successfully deployed an inexpensive mobile Wireless Sensor Network in a commercial warehouse served by a fleet of forklifts, aiming at improving forklift dispatching and reducing costs associated with the delays of loading/unloading delivery trucks. The forklifts were instrumented with sensor nodes that collect an array of information, including the forklifts' physical location, usage time, bumping/collision history, and battery status in an event-driven manner. A hypothesis testing algorithm was implemented to capture the location information. Combined with inventory information, the acquired information was fed into an Actor-Critic type stochastic optimization method to generate dispatching decisions.;For the second application, we considered an application where mobile vehicles (agents) fly in a forest with obstacles. They "chase" potentially moving targets that carry rewards, which the agents wish to collect by approaching the targets. We cast the problem into a Markov Decision Process framework. In order to seek an optimal policy that maximizes the long-term average reward collection, and to conquer the curse of dimensionality, we propose an approximate dynamic programming algorithm termed Distributed Actor-Critic Algorithm. Motivated by the way animals move while hunting for food, we incorporated several bio-inspired features into our control policy structure. Simulation results demonstrate that the policies with these bio-inspired features lead to a higher reward collection rate compared to the non-bio-inspired counterparts; by 40% in some examples. We also considered a setting where targets have intelligence and try to move away from agents in order to minimize the reward being collected. The problem is formulated as a Pursuit Evasion Game. Assuming that the targets also use an Actor-Critic method to optimize their control policy, we have shown that the game converges to a Local Nash Equilibrium. Furthermore, we proposed an Actor-Critic with Simulated Annealing (ACSA) algorithm, and established that the game converges to a Nash Equilibrium. Simulation results show that the ACSA algorithm can achieve a higher reward collection rate for both stationary and moving targets.
Keywords/Search Tags:Reward collection, Dynamic, Policies, Application, Algorithm, Targets
Related items