Font Size: a A A

Research On Deep Reinforcement Learning Algorithm And Applications

Posted on:2020-06-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L YuanFull Text:PDF
GTID:1368330620458586Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
At present,the research of robot technology has changed from the traditional mechanical dynamics to the direction of intelligent control.Especially after absorbing the research results in the fields of control theory,artificial neural network and machine learning,robot technology has gradually become one of the core of the artificial intelligence.How to endue robots with the ability of autonomous learning is one of the keys to the development of robot technology,and it is also the focus of current research and attention in the field of robot technology.Only when a robot has the ability of autonomous learning can it be called an intelligent robot.Therefore,how to design a better machine learning algorithm and use it to improve the intelligent level of robots has great and far-reaching significance.Reinforcement learning is one of the important algorithms in the field of machine learning.Its biggest characteristic is that it can learn autonomously through continuous interaction with the environment without the label training data being given.Reinforcement learning is one of the core technologies to improve the intelligence level of robots,especially in recent years,the combination of reinforcement learning and deep learning has shown a strong learning ability.Although the deep reinforcement learning algorithm has made great progress in the intelligent improvement of robots and has also achieved a lot of success,the research of deep reinforcement learning is still in its infancy.There are still some problems and challenges in practical applications,such as reward harking,data efficiency and motion smoothness.These disadvantages directly affect the application of reinforcement learning in the real environment,and even bring some damage to the agent.Therefore,in this thesis,based on these problems and challenges,we improved the existing reinforcement learning algorithms and proposed three new algorithms.In this thesis,the main contents and results are as follows:1.A new multi-step reinforcement learning algorithm based on On-Policy is proposed to solve the problem of reward hacking.Due to the existence of reward hacking,reinforcement learning can lead to some unexpected behaviors in practical applications.This unexpected way may subvert the designer's intentions and prevent the robot from moving in the way of the designer expects.Therefore,in order to solve this problem,a new multi-step reinforcement learning algorithm based on On-Policy is proposed in this thesis.Unlike traditional algorithms,the proposed method uses a new return function,which alters the discount of future rewards and no longer stresses the immediate reward as the main influence when selecting the current state action.The performance of the proposed method is evaluated on two games,Mappy and Mountain Car.The empirical results demonstrate that the proposed method can alleviate the negative impact of reward hacking and greatly improve the performance of reinforcement learning algorithm.2.A new multi-step reinforcement learning algorithm based on Off-Policy is proposed to solve the problem of the efficiency of data utilization.In deep reinforcement learning,the agent learns through trial and error.As the environment becomes complex,the deep reinforcement learning agent requires a large amount of time and data for learning.Therefore,how to effectively improve the efficiency of data utilization and reduce the training time is an urgent problem to be solved in deep reinforcement learning.In addition,the problem of data inefficiency will also cause the agent to try a large number of dangerous actions during the training process.This will further impact the security of the learning system.Therefore,in order to solve this problem,a new multistep reinforcement learning algorithm based on Off-Policy is proposed in this thesis.By combining the proposed method with classic deep reinforcement learning algorithms,two novel algorithms are proposed for improving the efficiency of learning from experience replay.The performance of the proposed algorithms are validated using two simulation environments,CartPole and DeepTraffic.The experimental results demonstrate that the proposed multi-step methods greatly improve the data efficiency of DRL agents3.A new multi-step reinforcement learning algorithm based on Off-Policy is proposed to solve the problem of the motion smoothness.Due to the joint motion of the robot is driven by the motor,if the motion trajectory(angle trajectory,angular velocity trajectory and angular acceleration trajectory)has great fluctuation,the driving torque of the motor will also produce great fluctuation,or even a larger mutation value.This can cause damage to the robot's joint.Therefore,like humans,the robot's motion trajectory needs to be smoothly coded,without sudden acceleration or jerk.However,only having a good deep reinforcement learning algorithm is not enough for solving this problem.The main reason is that the control policy of the traditional deep reinforcement learning algorithms is generated step by step,which is more inclined to make the robot complete the task quickly rather than imitate the smooth movement of the teaching trajectory.Therefore,in order to solve this problem,a new deep reinforcement learning algorithm based on dynamic movement primitives is proposed in this thesis.Different from traditional algorithm,the new algorithm consists of two learning hierarchies,the lower-level controller learning hierarchy and the upper-level policy learning hierarchy.In the new algorithm,the learning of the meta-parameters and generation of motion trajectories based on meta-parameters can be trained independently.This can make full use of the advantages of dynamic movement primitives and deep reinforcement learning algorithm at the same time.Based on this,the robot can not only generate smooth trajectories,but also have the ability of autonomous learning.The performance of the new algorithm is evaluated by a UR5 robot.The experimental results demonstrate that the proposed algorithm can endow robots with human-like abilities to perform motor skills in a smooth and natural way.
Keywords/Search Tags:Machine learning, deep reinforcement learning, reward hacking, dynamic movement primitives, robot
PDF Full Text Request
Related items