Tracking and navigation have been popular research topics in the field of computer vision and control,and have a wide range of promising applications,such as unmanned aircraft delivery,autonomous driving,intelligent surveillance,and multipurpose robots.In this paper,we focus on the research of active multi-object tracking(AMOT)and autonomous navigation using vision as the main observation information in this field.Among them,the AMOT task requires multiple cameras to collaborate in tracking multiple targets to ensure that as many targets as possible are covered within the common field of view of the camera group.Autonomous navigation research,on the other hand,uses depth images as the main input and requires the agent to reach the destination as soon as possible given the starting and target locations.Both of them belong to the intersection of computer vision and control while having certain similarities.In this paper,we propose deep reinforcement learning based solutions for both tasks.For the AMOT task,this paper first formulates it as a decentralized partially observable Markov decision process and proposes a collaborative multi-camera tracking method that uses coordinate alignment to integrate partial observation information.To deal with the problem of lacking effective data sets and testing environments,a new virtual environment is built to simulate real-world AMOT scenarios related,which is used to train and evaluate the proposed method.In this environment,each camera as an intelligent body takes the RGB image from the current viewpoint and the position pose information of each camera as the observation space,and YoloV4-Tiny is used as the target detector to extract the targets’ bounding boxes.The information is then mapped to the 3D global coordinates of the target in the environment by the inverse projection transformation,and the coordinates of each observation are aligned and finally delivered to the deep Q-value network as the joint features to output the actions for each camera.The model is trained by the A3C algorithm,using the temporal differential error as the loss function.Experimental results show that the proposed method outperforms the baseline method with fixed cameras and achieves higher coverage.The rationality of target detector and observation integration,and the effectiveness of setting up a reward function combining team and individual rewards are also verified.For the visual navigation task,this paper proposes a visual autonomous navigation method using deep images as inputs and reinforcement learning,which is mainly based on the rule-based strategy that win the first place of the FPS AI competition in the WILD-SCAV environment.The model used in this method integrates the depth map and non-visual information into a 3D occupancy grid map and a heat map as the main part of the state representation,and outputs the actions in three dimensions:walking direction,camera angle direction and jumping by a reinforcement learning network.In the training,this paper first leverages imitation learning using the winning rule-based policy to generate a pre-trained model,and trains the model on the navigation task with the distributed PPO algorithm.Subsequently,this paper further generalizes the model to the supply gathering and supply battle tasks with additional rule-enhanced methods.Experimental results show that the model proposed in this paper outperforms a variety of different rule-based and reinforcement learning strategies in three different tasks,including the winning rule used for imitation learning. |