In recent years,China’s mariculture has developed rapidly,and cage culture is an important part of mariculture.In the process of cage culture,the culture environment,the operation condition of cage and the growth condition of culture organisms need to be monitored regularly,so regular inspection is a necessary part of the cage culture.The use of underwater robots for dynamic inspection is one of the important development trends in the future.In this paper,we focus on the underwater robot autonomous cage inspection to carry out relevant research,and mainly accomplish the following work:(1)Convert the cage autonomous inspection problem into a vision-based underwater robot pipeline tracking problem.The current cage inspection methods mainly include manual inspection method and sensor fixed point method,which have limitations such as high cost and lack of flexibility.The camera provides a low-cost way to obtain information.Reinforcement learning learns decisions by trial and error and can build nonlinear dynamical models,and the combination with deep learning enhances its high-dimensional data processing capability.This makes the combined vision and deep reinforcement learning approach particularly suitable for underwater robotic cage inspection.Therefore,this paper combines the features of the cage structure,takes the cage sinker ring as a pipeline,uses the deep reinforcement learning algorithm to realize pipeline tracking,and converts the cage inspection problem into a pipeline tracking problem,so that the inspection problem can be solved.(2)Designed and implemented a reinforcement learning training system for cage inspection task.The experiment in real environments is costly and dangerous,and lacks a suitable simulation platforms for cage culture scene.To this end,this paper designs and implements a reinforcement learning training platform based on ROS,UUV Simulator and Open AI Gym for the cage inspection task.In UUV Simulator,the simulation of a aquaculture cage inspection scene is implemented.The scene contains cages,inspection robots,and various sensors.The simulation environment is encapsulated based on ROS and Open AI Gym,and the interaction between the reinforcement learning algorithm and the inspection simulation environment is standardized.Five types of cage culture Gym environments are preset for the development of different reinforcement learning algorithms.Users can also customize the environments in the system.The system can be used for reinforcement learning training and performance evaluation,as well as for traditional control algorithm testing.(3)PPO-based autonomous cage inspection policy learning for underwater robots.In this paper,the cage inspection problem is modeled as a Markov Decision Process(MDP)with continuous actions.The state is defined as the image captured by the camera,and the action is defined as the linear and angular velocities of the AUV.The reward function is designed based on the AUV offset distance,deflection angle,and running speed.Designed neural network structure of the policy,and use PPO for policy learning.(4)Inspection policy learning based on MBR feature extraction.Considering the storage and arithmetic resource limitations of edge devices,reducing the size of the policy network model is crucial for the application of inspection policies.In this paper,we propose a policy learning method based on feature extraction.The feature extractor composed of convolutional layers in the policy network is replaced by the feature extraction method based on minimum bounding rectangle(MBR),which effectively reduces the policy model size and accelerates the policy learning speed.However,the stability of this method is weakened.(5)Inspection policy learning based on DSC-VAE feature extraction.Variational AutoEncoder(VAE)can effectively extract low-dimensional latent features from high-dimensional data.In this paper,the encoder network of VAE is improved by replacing the standard convolution in the encoder with depthwise separable convolution(DSC)to implement the lightweight model DSC-VAE.The convolution layer acting as a feature extractor in the policy network is replaced by the DSC-VAE encoder network.Experimental results show that the amount of parameters is reduced while stability is maintained,achieving a balance between efficiency and stability. |