Font Size: a A A

Research And Design Of Soccer Robot Decision System Based On SARSA Algorithm

Posted on:2018-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:C Y SongFull Text:PDF
GTID:2348330533969222Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Robo Cup 2D simulation robot soccer game platform is a platform for the research of multi-agent robot system.The researchers can test different machine learning algorithms on the platform.Reinforcement learning is one of the most important algorithms among machine learning algorithms,which allows the agent to interact with the environment continuously to get the maximum cumulative reward.Under certain conditions,reinforcement learning can ensure that the learning of agents can converge to the optimal strategy.Reinforcement learning has been widely and successfully used in chess,backgammon,Tetris and other games.However it has not been fully studied in the Robo Cup 2D simulation game.This paper introduces SARSA algorithm into Robo Cup 2D simulation game and improves it.According to the defensive player of the position and the position of football,we complete the mapping of player agent of the state space.With the mapping's corresponding precondition function which is the select action basis of SARSA algorithm we achieve the design and implement of SARSA algorithm on Helios framework.Based on the knowledge of soccer field,this paper proposes two reward shaping functions,including separation-based reward shaping function and transfer distance based reward shaping function,so that the team has a better performance.In multi-agent systems,single agent often only can get the sparse Q table through the reinforcement learning,which can't represent the global situation of the whole system.In order to solve this problem,based on multi-agent shared Q table methods,we put forward Q integration algorithm,making the team get a higher winning percentage in the game.For the reason that reinforcement learning algorithm often cannot guarantee that Q table converges to the optimal strategy.This paper compares convergence of the adaptive ?-greedy action selection strategy and the fixed ?-greedy action selection strategy,and according to the experiment result we finally choose the adaptive ?-greedy.Then,this paper compares the effects of different reward functions on the goal scoring,an d determines the reward function.Meanwhile we contrast the influence of SARSA algorithm with two kinds of reward shaping function and verify the positive role of the methods on the team.In the end,a number of matches with the World Cup team are carried out,and the result is statistically analyzed,which verifies the validity of the algorithm.
Keywords/Search Tags:reinforcement learning, SARSA algorithm, ?-greedy, reward shaping
PDF Full Text Request
Related items