| Reinforcement learning that is driven by data has unique advantages such as working condition self-adaptation and parameter self-adjustment.It has significant potential in motion planning and control technology in autonomous parking scenarios for autonomous vehicles.However,it is usually difficult for the existing reinforcement learning-based automatic parking motion planning and control methods to achieve both low influences of model errors and efficient learning process.Especially,it is still difficult for reinforcement learning in high dimensional action space to converge quickly.The control accuracy of trajectory following in the spatiotemporal domain still lacks research.Aiming at the above problems,this paper takes the automatic parking system as the research object,and proposes a model-based reinforcement learning for lateral planning utilizing Monte Carlo tree search(MCTS)and Artificial Neural Networks(ANNs,including Value Neural Network(VNN)and Policy Neural Network(PNN)),integrated planning method in lateral and longitudinal directions based on imitation learning and improved MCTS,and trajectory-following control method based on iterative learning control.First,in order to reduce the requirements of model accuracy,a model-based reinforcement learning approach,which combines tree search and state estimate,is proposed for lateral planning of automatic parking.This method learns to estimate the value of parking state through the VNN,learns to predict reactive parking experience action probability through PNN,and realizes the direct steering wheel angle decision with MCTS.It is different from the existing model-based reinforcement learning parking method that needs to fit the vehicle model first offline,and then plan the motion online using the model.On this basis,in order to further realize the efficient learning of the planning method,a two-stage training pipeline of ANNs,policy learning by weighting exploration with the returns,and branch data augmentation based on simulation are proposed.Second,in order to solve the problem of reinforcement learning in high-dimensional action space,an online planning method for automatic parking based on demonstration learning and reinforcement learning is proposed.This method first generates a global optimal automatic parking trajectory in the spatiotemporal domain considering the parking time and the number of gear shifts offline based on Nonlinear Programming-based(NLP),and then uses the supervised learning method to learn the policy of NLP with PNN.Finally,the PNN and MCTS are used to further generate the parking trajectory data to train the VNN.The truncated MCTS that satisfies multiple constraints and uses online heuristics is used to realize integrated online planning.Third,the speed response lag of the vehicle seriously affects the following control accuracy in the spatiotemporal domain of the trajectory of the motion planning module.In order to solve the above problems,an automatic parking trajectory-following control method based on iterative learning control of vehicle speed is proposed.This method effectively utilizes the repetitive nature of the parking task: first,a simplified error model is established in the lateral direction,and the model predictive control is exploited to follow the vehicle trajectory,then,the iterative learning control law of the vehicle speed is established in the longitudinal direction.The historical vehicle speed control error data is used to adjust the longitudinal control law parameters to realize the vehicle speed compensation and low transient tracking error.This method avoids the modeling of model uncertainty and disturbances and effectively improves the control accuracy of trajectory following in the spatiotemporal domain during automatic parking.The above methods are verified in real vehicles.When verifying the motion planning method,the planned vehicle speed and steering wheel angle are directly outputted for parking.No online motion planning is performed when verifying trajectory following control.A hierarchical “trajectory-planning and trajectory-tracking” framework is used.The NLP-MCTS and the trajectory-following control module are integrated.The method of upper-layer online planning and lower-layer learning control is used in parallel parking.Test conditions with different initial parking positions,different storage space sizes,and different road adhesion coefficients are designed.The feasibility of the double-layered data-driven approach and its adaptability to different test conditions are verified in the Car Sim/Simulink. |