Robotic manipulation is the core method by which robots interact in real time and modify the external environment to accomplish specified tasks.It plays an important role in industrial manufacturing,service industries,and daily life.Especially,dexterous robotic manipulation can execute complicated tasks for humans and thus has been drawing researchers’ attention.However,the dexterity of robots also means that they have a high-dimensional action space and are often exposed to dynamic and high-complexity environments.Limitations such as uncertainty of the underactuated robot’s behavior,imprecision of the model,lack of environmental perception and noises during the manipulation,and lack of interpretability of the robot’s behavior pose challenges to the manipulation task.How to design and optimize the robot control strategies to achieve robust manipulation control is an urgent problem to be solved.In this thesis,we take underactuated robot manipulation in unstructured environment as background,and conduct research at three domains:robotic design optimization and safe control,active perception and control policy transfer,subtask decomposition and task planning.We resort to improving robot performance,safety,and behavioral interpretability by applying analytical modeling,reinforcement learning and imitation learning.The main contribution is as follows:(1)The problems of robotic parameter optimization and safe control under simplified models of underactuated soft robots are studied.To address concerns that soft robots have high redundant degrees of freedom,and analytical method is often missed for efficient design optimization and control,an approximate model combined with numerical iteration methods is proposed for actuator parameter optimization and performance improvement.Secondly,considering the fact that the soft robot can only passively adapt to the environment and cannot further guarantee safety in dangerous situations,an impact model is proposed to realize collision detection and active safe reaction,leveraging the inherent compliance of soft structure.The experimental results show that the proposed method can efficiently evaluate the soft robot parameters and the safe reaction method can achieve reliable active safe control.(2)The problems of active perception,dynamics modeling and reinforcement learning policy transfer are investigated.Due to the lack of ability to perceive objects in unstructured environments,active perception is proposed for exploration of object surfaces.Secondly,when transfering policies to new environments,simultaneous environmental dynamics and control policy optimization is proposed to deal with domain shift problem.We combine model-based reinforcement learning policy with numerical optimization and adaptive transfer learning methods,and use the trajectory optimization method as guidance for policy search.The experimental results show that the proposed active perception strategy can improve the robot’s perception and cognitive ability of the environment,and the simultaneous model parameter estimation and policy search methods can effectively correct environmental parameter errors and extract information of prior model,thus ensuring faster adaptation to new environments and achieving reliable policy transfer.(3)The problems of exploiting action primitives for subtask recognition and segmentation to achieve robot task planning are researched.To address the problem that the end-to-end reinforcement learning lacks interpretability and constraints on the state space,which leads to low policy reliability,a two-layered policy network with clear physical meaning and a corresponding stable training method are proposed.Furthermore,to solve the problem that traditional robot task planning methods cannot express complex dexterous manipulation processes and data-driven methods lack interpretability of robot behaviors,we propose to use goal-conditioned action primitives combined with imitation learning methods to decompose a multi-phase complex task into aligned primitive subtasks and extract subtask execution goals.The experimental results show that the proposed policy structure and training method have higher robustness,and the task planning method can extract expert demonstration information in greater depth to achieve behaviorally interpretable task planning,improve human trustness,and enhance situational awareness in a dynamic environment. |