Optimized dynamic vehicle routing policies with applications

Posted on:2013-06-21

Degree:Ph.D

Type:Thesis

University:Boston University

Candidate:Lin, Yingwei

Full Text:PDF

GTID:2458390008984408

Subject:Engineering

Abstract/Summary:

This dissertation addresses two applications: (a) optimizing dynamic vehicle routing policies in warehouse forklift dispatching, and (b) reward collection by a group of air vehicles in a 3-dimensional mission space. For the first application, we successfully deployed an inexpensive mobile Wireless Sensor Network in a commercial warehouse served by a fleet of forklifts, aiming at improving forklift dispatching and reducing costs associated with the delays of loading/unloading delivery trucks. The forklifts were instrumented with sensor nodes that collect an array of information, including the forklifts' physical location, usage time, bumping/collision history, and battery status in an event-driven manner. A hypothesis testing algorithm was implemented to capture the location information. Combined with inventory information, the acquired information was fed into an Actor-Critic type stochastic optimization method to generate dispatching decisions.;For the second application, we considered an application where mobile vehicles (agents) fly in a forest with obstacles. They "chase" potentially moving targets that carry rewards, which the agents wish to collect by approaching the targets. We cast the problem into a Markov Decision Process framework. In order to seek an optimal policy that maximizes the long-term average reward collection, and to conquer the curse of dimensionality, we propose an approximate dynamic programming algorithm termed Distributed Actor-Critic Algorithm. Motivated by the way animals move while hunting for food, we incorporated several bio-inspired features into our control policy structure. Simulation results demonstrate that the policies with these bio-inspired features lead to a higher reward collection rate compared to the non-bio-inspired counterparts; by 40% in some examples. We also considered a setting where targets have intelligence and try to move away from agents in order to minimize the reward being collected. The problem is formulated as a Pursuit Evasion Game. Assuming that the targets also use an Actor-Critic method to optimize their control policy, we have shown that the game converges to a Local Nash Equilibrium. Furthermore, we proposed an Actor-Critic with Simulated Annealing (ACSA) algorithm, and established that the game converges to a Nash Equilibrium. Simulation results show that the ACSA algorithm can achieve a higher reward collection rate for both stationary and moving targets.

Keywords/Search Tags:

Reward collection, Dynamic, Policies, Application, Algorithm, Targets

Related items

1	Research And Application Of Deep Reinforcenment Learning Algorithms Based On Reward Shaping
2	Research On Dynamic Coding Characteristics Of Reward Prediction Error And Brain Inspired Q-learning Algorithm
3	Research On Reward Optimization In Reinforcement Learning
4	Research On The Collection Development Policies Of The Ivy League And C9League Libraries
5	Dim Multi-Targets Detect And Tracking Based On Dynamic Programming
6	Research On Optimal Path Of Dynamic Route Guidance System
7	THE RELATIONSHIP OF COLLECTION DEVELOPMENT POLICIES AND MICROSOFTWARE COLLECTIONS IN SMALL ACADEMIC LIBRARIES: SIX CASE STUDIES
8	Herding Or Bystanding? Dynamic Analysis Of The Effect Of Reward-based Crowdfunding Pre-investment
9	Theory and application of reward shaping in reinforcement learning
10	Multi-colony Ant Colony Algorithm Based On Reward And Punishment Mechanism And Its Applications