TD3 And Physics Simulation Reasoning Based Open Task Solving

Posted on:2021-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:S T Chai

Full Text:PDF

GTID:2518306569496614

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

How to help the agent to utilize the learnt policy during exploring in th environment to solve open tasks and generalize the policy to new environments is studied.The open tasks often have sparse rewards and huge exploration space,so a better exploration strategy is needed to speed up the learning process of the agent.To ensure that the learnt knowledge is not limited to a specific task,and help the agent to get the sparse reward,intrinsic rewards are introduced and used to encourage the agent to explore in the environment.A task which requires the end effector of a manipulator to reach a specific body is designed with the help of the robot learning framework Pyrobolearn.New reinforcement learning algorithms are proposed to train an agent to learn in a task-agnostic environment without reward related to the task,where the learnt policy of the agent should be generalizable to the proposed task.Based on TD3 and HER algorithms,a locality sensitive hashing based counting reward is introduced to encourage the agent to explore in the unknown environment,where the LSH method is used to discretize the state space and count the number of visits.If the number of visits is high,then give the agent less reward to encourage the agent to visit the less visited states.Based on HER algorithm,a trajector replacement strategy is proposed.Trajectory replacement strategy replace every failed trajectory to get a trajectory which is more likely to success,and store the replaced trajectory and the original trajectory in the replay buffer for training.Trajectory replacement strategy doesn't require the state vector to be consisted of achieved goals and desired goals,so it can be adapted to broader cases and more complex tasks.A mixed gaussian noise layer is proposed to provide a adaptive policy noise.Mixed gaussian noise can be obtained by using a fully-connected layer to rescale a gaussian noise vector.The mixed gaussian noise then is used to convert the deterministic action produced by the actor network to randomly exploring action,where the fully connected layer weights which decides the quality of exploration is learnt automatically with back-propagation.Experiment results indicate that the agent with mixed gaussian noise converged to a higher environmental reward and higher success rate.A reward based on physics simulation time is proposed.Because that solving open tasks often is related to complicated dynamics of the environment and simulating this will cost more time,the simulation time can be used as a reward to help the agent to improve its exploration strategy and generalizability.The simulation time reward can be used to encourage the agent to learn a generalizable policy in a task agnostic environment,and adapt to a new policy after given a task setting with sparse reward.Experiment results indicate that simulation time reward is correlated with the task-specific reward,and it can be used to improve the exploration strategy.

Keywords/Search Tags:

Meta Learning, Reinforcement Learning, Robotics, Physics Simulation, Sparse Reward

PDF Full Text Request

Related items

1	Research On Sparse Reward Based On Reinforcement Learning
2	Research On Reward Optimization In Reinforcement Learning
3	Research On The Sparse Reward Problem Based On Hierarchical Reinforcement Learning
4	Research On Reinforcement Learning Methods Towards Unfixed Tasks And Non-static Environments
5	Research On Multi-Agent Reinforcement Learning Under Sparse Reward Scenario
6	Algorithm Research On Knowledge Reuse And Generalization Ability Of Meta-learning
7	Reward Mechanism Research Of Reinforcement Learning-based Continuous Integration Test Case Prioritization
8	Research And Application Of Deep Reinforcenment Learning Algorithms Based On Reward Shaping
9	Researches On Efficient Exploration Driven By Reward Function
10	Research On Deep Reinforcement Learning Algorithm Based On The Combination Of Intrinsic Reward And Auxiliary Tasks