| With the progress of technology,today’s society is gradually transforming from information society to intelligent society.Video surveillance has been spread across roads,cities,security,factories,homes and other scenes.The current video surveillance system mainly uses the visual sensor to shoot and store the video information of the corresponding scene.It does not have the intelligence,so the video surveillance system can not use the information of each visual sensor comprehensively.With the development of video analysis technology,video surveillance gradually develops to the direction of intelligent analysis and multi-vision sensor information fusion.Regional surveillance cameras are usually arranged on building walls and other locations,with a large angle of view to facilitate monitoring of relevant areas.However,due to the building structure and other reasons,the surveillance camera often exists monitoring dead area.Therefore,this paper studies the integration of regional surveillance cameras and robot vision sensors to form a global and local complementary visual scene perception system.By constructing multi-task real-time target detection segmentation and location system,the scene is analyzed in detail.The information of regional monitoring camera and robot vision sensor is integrated to realize the search of the characteristic target of robot working scene.The main research contents can be summarized as follows:(1)The research status of multi-sensor collaborative perception and fusion deep learning of multi-visual sensor collaborative perception is expounded.Based on the problems of lack of clear cognition of complex environment and shallow fusion of multi-source visual sensors,solutions such as multi-source sensor deep fusion system are proposed.(2)Aiming at the problem of insufficient cognition of complex environment,the multitask real-time target detection segmentation and positioning system is constructed.A multi-task model based on one-stage structure is constructed,and a TLFDNet network model is proposed to locate moving or stationary targets in the scene under video surveillance.This method uses the monocular visual depth detection method to locate the detected targets,and optimizes the problem of large error of traditional detection block positioning algorithm.At the same time,the collected environmental data set is used to carry out multi-task training of target detection and segmentation for the model,so that it can divide the movable area in the scene and assist the downstream tasks.The experiment verifies the effectiveness of the algorithm.(3)Aiming at the limited target retrieval ability of a single sensor,this paper proposes a target retrieval system based on multi-source vision sensor fusion.Combined with the flexibility of mobile robot in the scene,the mobile robot vision sensor and video surveillance camera were fused with multi-source information,which overcame the limited ability of a single vision sensor in target retrieval and improved the timeliness of the target retrieval system.Experiments verified the effectiveness of the method.(4)Construct the experimental environment in the actual scene,construct the multi-source visual perception system,use Socket communication to realize the information interaction at all levels,realize the effective integration of the robot vision sensor and regional monitoring information,and verify the practicability of the system in the actual scene. |