The development of deep learning has brought new possibilities to robot visual inspection systems.Target detection converts sensor data from robotic devices into structured information with representational significance to achieve classification and positioning of targets in images.In the field of robot vision,traditional(Red Green Blue,RGB)image data naturally lacks physical distance information between the mobile terminal and the perceived target,and other data sources are needed to remedy this defect.Depth data naturally contains distance information,but the current research on network detection models based on depth data is relatively lacking.In response to the above issues,this paper proposes a new set of deep pedestrian data sets to compensate for the lack of data sets in related fields.It also conducts performance research on the current classic mainstream visual object detection network under multimodal data,comprehensively selecting the detection network algorithm with the best performance to improve,and increasing the(Average-Precision,AP)value from 0.956 to 0.978 in deep data sets.In addition,considering that the data captured by mobile robots are often sequential data,this paper also proposes a new sequence detection network TDS-DETR,which is innovative in model structure,location coding,and sequence matching stages to achieve serialized data detection.Compared to the twodimensional coded form detection model,our TDS-DETR model improves the detection performance Ap value in deep pedestrian data sets by 11.4%,and reaches 92.9%in short sequence deep data. |