Study On The Learning And Planning Algorithm Of Intelligent Agent Based On Performance Potentials

Posted on:2015-03-20

Degree:Master

Type:Thesis

Country:China

Candidate:H H Huang

Full Text:PDF

GTID:2268330428997037

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

AI planning and machine learning has been the research hotspots of Artificial Intelligence (AI). In our life, a large class of problems of sequence decision making can be described as Markov decision process (MDP). The concept of performance potentials based on MDP provides a new theoretical frame for solving and optimizing these problems mentioned above. Also, it can use the estimation of a sample path to optimize and solve the system with unknown parameters on-line. When the system’s parameters such as the state transition probability function and reward function are unknown in advance, reinforcement learning has been used to learn the system’s optimal policies. With the consideration of such feature, it can be well-combined with the theory of performance potentials to obtain more efficient on-line optimal algorithms.In recent years, however, solving large-scale decision-making problems involving uncertainties have been one of main challenges for the development of AI theoretical research with the applied extension of AI. For the issue about "curse of dimensionality" in agent decision problems, this thesis proposes a learning algorithm combined with the heuristic search, and then we analyze its model and effectiveness according to the decision-making problems in RoboCup2D soccer simulation platform.Specifically, the main contributions and achievements of this thesis are given below:(?) In the first part, we provide a basic introduction about reinforcement learning algorithm, performance potentials theory and heuristic search method with their development status. Then we analyze their advantages and disadvantages in the process of solving problems. Also, we illustrate the significance of these algorithms using in the research of soccer simulation.(?) Toward the disadvantages of reinforcement learning and performance potentials theory used in large-scale problems, such as unstable during the solving process and the learning time, this work presents a new algorithm, called A*Average Reward Reinforcement Learning Algorithm Based on Performance Potentials, that based on the performance potentials theory and heuristic search. A heuristic function that influences the choice of the actions according to some heuristic policies is used in G-learning to accelerate the rate of convergence. Then we perform a series of experiments to test and analyze the effectiveness of the algorithm on the benchmark problem, namely Grid-World, using RL tool box.(?) On the basis of a simplified simulator for the robot soccer domain in RoboCup2D soccer simulation platform-keepaway, we design an action-generator according to Option theory. Then we combine with the GA*-learning algorithm and apply them to the agent decision process, which enable the agent’s skills to get a better performance.In conclusion, this thesis proposes a new algorithm called GA*-learning according to reinforcement and performance potentials theory. We also perform a series of experiments to verify the significance of our algorithm. All above work in this thesis has been applied to the code design of2D soccer simulation team-GDUT_TiJi, which has participated RoboCup2013and RoboCup China Open2013and won first prize and9th place respectively.

Keywords/Search Tags:

Reinforcement Learning, Markov Decision Process, Performance PotentialsTheory, Heuristic Search, RoboCup

PDF Full Text Request

Related items

1	Heuristic Learning Model Based On Partially Observable Markov Decision Process
2	Parallel Algorithms For Large-Scale Markov Decision Processes Based On Performance Potentials
3	Unified Algorithms For Semi-Markov Decision Processes With Discounted And Average Criteria Based On Performance Potentials By Reinforcement Learning
4	Inverse Reinforcement Learning Algorithms In Semi-markov Environment
5	Research On Agent Decision Problem Based On Markov Decision Process Theory
6	Continuous Time Hierarchical Reinforcement Learning Algorithm
7	Research On Reinforcement Learning Based Communication Jamming Strategy Learning Methods
8	Research On Sample-efficient Reinforcement Learning Methods
9	Theories, Algortihms And Applications Of Policy Gradient Reinforcement Learning
10	Reinforcement Learning On Named Entity Recognition Task