The ability of a robot to autonomously recognize and grasp objects determines the upper limit of its execution power.In the field of robot grasping applications,factors such as complex work environments and complex grasping objects are important limitations to the stability,accuracy,and robustness of robot grasping capabilities.To address the problem of single grasping patterns and difficult grasping object pose estimation for unknown objects in unstructured environments,this paper proposes a multi-modal image fusion and zero-shot training grasping detection network in three aspects:1.A robot grasping detection network based on multimodal dynamic collaborative fusion is proposed.In order to flexibly select the advantages of multiple fusion methods,an end-to-end crawling detection network has been built.The crawling detection method of the multimodal dynamic collaborative fusion network proposed in this paper extracts the features required for different fusion stages from two types of modal data,and performs feature fusion through dynamic collaborative fusion.This can effectively reduce the adverse effects of different fusion stages and fully leverage the advantages of multimodal feature fusion.2.A zero-shot based grasp detection network is proposed in this paper.In the grasp detection method,the concept of zero-shot learning is introduced to generate visual features of grasp targets through language-level semantics.This enables the generation of grasp pose bounding boxes without the need for any such object in the image data.This method significantly reduces training costs while effectively improving the target detection capability of the network.3.A 6-DOF robot control system based on the ROS platform is constructed,and two grasp detection network models proposed in this paper deployed in the system to achieve autonomous localization and grasping by the robot.The Gluon robotic arm is used to build the robot control system,with the ROS open-source platform serving as the robot’s underlying software framework.The Kinect DK depth camera is driven within ROS as the robot’s visual perception unit,and the grasp detection network models are deployed within it.This enables real-time acquisition of multimodal image data of grasp targets for grasp detection.Finally,the MoveIt!motion control component is used to achieve the research objective of the robot’s autonomous recognition and grasping. |