With the continuous development of my country’s aerospace technology,the demand for large-scale space equipment and space systems has become more and more urgent.Restricted by objective technical conditions,it is difficult to directly launch large space equipment into orbit.Dividing it into multiple subsystem modules and implementing on-orbit assembly after batch launches is the most feasible solution at this stage.The importance of on-orbit assembly technology for the future development of aerospace industry is self-evident.The on-orbit assembly task is complex,the working modules are complex and expensive,and the movement accuracy and end flexibility of the robotic arm are extremely demanding.Redundant manipulators have high control accuracy and flexibility.Most of the research work on on-orbit assembly at home and abroad takes redundant manipulators as the core module.The kinematics solution is the basis of the motion control of the manipulator.The trajectory planning is the premise of realizing the precise movement of the manipulator.The impedance control with environmental adaptability can make the manipulator safely perform various on-orbit tasks.These three technologies are on-orbit assembly.The key technology of the robotic arm.However,the complex kinematics and dynamic models of redundant manipulators bring great challenges to the realization of these three key technologies.The intelligent control algorithm is data-based and task-oriented,providing new ideas for the realization and optimization of the above-mentioned key technologies.Facing the country’s major demand for space intelligence,carry out research on manipulator intelligence for the three key technologies of kinematics solution in on-orbit assembly,robotic arm end trajectory planning and impedance control,and provide technical foundations and accumulation for space intelligence.Engineering experience.The main research contents include.(1)Aiming at the slow iteration of numerical algorithm for inverse kinematics solution,an intelligent inverse kinematics solution algorithm based on K-means++clustering and Deep Deterministic Policy Gradient(DDPG)was proposed.The iteration speed of the inverse kinematics solution is related to the initial value and the iteration factor.The closer the initial value is to the target point,the faster the iteration speed is;the iteration factor determines the amount of convergence of each iteration step.First,for the iterative initial value problem,the K-means++ clustering algorithm is used to classify the reachable space point set of the manipulator according to the spatial distance.The joint angle corresponding to the center point of the class to which the target point of the inverse solution belongs is used as the initial value of the iteration,which effectively reduces the spatial distance between the initial value of the inverse solution and the target point.The experimental verification can reduce the number of iteration steps of the inverse solution by 32.1%.Then,for the iterative factor problem,the reinforcement learning method is used to train the manipulator,so that it can select the appropriate iteration factor in each iteration step to accelerate the iterative convergence as much as possible.The DDPG algorithm is a reinforcement learning method suitable for continuous state space and continuous action space,which can perceive the motion state of the manipulator and generate appropriate iteration factors according to the optimization strategy.Experiments show that the DDPG algorithm can reduce the inverse solution iteration steps by 28.5% and the inverse solution time by 19.7%.Finally,combining the K-means++ algorithm with the DDPG algorithm,the experimental verification can reduce the number of inverse solution iteration steps by up to 42% and the inverse solution time by 25.1%.(2)To meet the needs of autonomous trajectory planning in the assembly process,a Dynamic Movement Primitives(DMP)trajectory planning algorithm based on Policy Improvement with Path Integrals(PI2)was proposed.DMP can simulate a new trajectory from the taught trajectory.The shape of the new trajectory is determined by the trajectory learning parameters in the DMP.The better the fit of the original trajectory,the higher the controllability of the DMP.Adding an artificial potential field to the DMP can make the simulated trajectory avoid obstacles with known shapes and positions,and the obstacle avoidance performance is determined by the potential function parameters.Obviously,the controllability and obstacle avoidance performance of DMP affect each other.How to choose the optimal trajectory learning parameters and potential function parameters is an important high-dimensional optimization problem.PI2 is a model-free,sampling-based learning method.The algorithm does not need to adjust the algorithm parameters except to explore the noise.It is especially suitable for high-dimensional optimization problems with multiple degrees of freedom.Only by setting a reasonable reward function for PI2,the trajectory learning parameters and potential function parameters in DMP can be optimized at the same time.Simulation and experiments show that under the premise of ensuring the success of obstacle avoidance,the DMP optimized by PI2 has a 10.4% improvement in the degree of trajectory fit compared with the DMP with fixed parameters.Basically,redundant robot has the ability to independently plan the trajectory.(3)Aiming at the requirement of the flexible end of the manipulator in the assembly process,an impedance control algorithm based on Twin Delayed Deep Deterministic Policy Gradient(TD3)was proposed.During the assembly process,the end compliance of the robotic arm can improve the safety of the load and cooperating objects.Impedance control can effectively achieve the flexibility of the robotic arm to external forces.However,the adjustment of impedance control parameters is complicated and only applicable to a single scene.TD3 is also a deep reinforcement learning algorithm suitable for continuous state space and continuous action space.It improves the shortcomings of the DDPG algorithm and performs better in environments with large action space dimensions,so it is suitable for the implementation of impedance control algorithms.The intelligent impedance control algorithm based on TD3 can independently select the impedance parameters adapted to the environment according to the force on the end of the manipulator.In order to reflect the flexibility of the external force,the jerk of the end is set as a penalty function.The greater the jerk,the worse the flexibility of the end to the external force.The experiment verifies that the impedance control algorithm based on TD3 can automatically adjust the impedance control parameters according to the end force.Compared with the impedance control algorithm with fixed parameters,the end jerk norm is reduced by 79.7%. |