Autonomous decision and control is the core functionality of high-level autonomous driving intelligence.The existing hierarchical method decomposes the decision-making functions into prediction,behavior selection,planning,control and other modules,which is conducive to cooperative development and driving interpretability.However,its system design mainly relies on expert rules,which is nearly impossible to cover all driving possibilities and use data for persistent evolution.Besides,each module suffers high computational complexity.For example,trajectory planning needs to solve a constrained nonlinear programming problem online to obtain feasible trajectory,resulting in insufficient real-time performance.This paper studies a real-time and scalable decision and control method with safety assurance for automated vehicles.It also develops theory for mixed-driven reinforcement learning(RL)with high learning efficiency and performance,and builds a distributed asynchronous parallel computing toolchain for solving the driving policy.This work lays the foundation for the decision and control functionality of high-level automated vehicles.Firstly,for the unsatisfactory comprehensive performance of existing hierarchical decision and control methods,this paper proposes the integrated decision and control(IDC)method,which is real-time,scalable and with safety guarantees.IDC is designed with two modules: static path planning and dynamic optimal tracking.The former only considers the scene static information to adaptively generate the candidate path set,so it can scale among driving scenes; The latter further combines the dynamic traffic information and constructs a constrained optimal tracking problem to realize the path selection and the optimization of safe vehicle instructions.To ensure the real-time performance,an order-free weighted encoding mechanism is developed to represent the dynamic traffic information,then the path selection and tracking policy is obtained offline by solving the problem using a constrained RL method.With the policy,the IDC realizes efficient online application.Then,a policy gradient RL method driven by both the driving data and the vehicle model is proposed for the balance of the convergence speed and the asymptotic performance.It constructs a general formulation of the policy gradient estimate and deduces its analytical error bound from the data and model errors.Base on that,the data-driven and model-driven policy gradient error bound are revealed and compared,leading to two policy gradient estimate approaches,called mixed weighting and mixed state.And the mixed policy gradient(MPG)algorithm is finally derived.The test shows that compared with data-driven methods,the convergence speed is improved by 3 times; and compared with model-driven methods,the policy performance is increased by 93.1%.Thirdly,a distributed asynchronous parallel computing toolchain is established for solving the IDC driving policy,which consists of a solving module and a simulation module.The solving module employs the MPG algorithm to resolve the IDC constrained problem.And it develops a distributed parallel computing solver with the help of the Ray framework.The solver can asynchronously implement data sampling,data replaying,gradient computing,network updating and policy evaluating in a distributed parallel way.On the other hand,the simulation module designs the constrained Markov decision process elements under a signalized intersection with mixed traffic flow,and builds a high-fidelity autonomous driving simulation environment for efficient data sampling and policy evaluating.Finally,the experimental platform is built relying on the Di Di automated vehicle.And the driving performance of the IDC is verified in a real signalized intersection with mixed traffic flow.The simulation and real road test show that: compared with the ipopt solver,the computing efficiency is improved by 501.2 times without performance degradation; and compared with the hierarchical method,the computing efficiency is promoted by 22.4 times while the driving effectiveness is increased by 32.3%.In addition,the automated vehicle is able to safely complete all the 32 designed scenes,while the average passing time is less than 35 s and the average decision and control time of a single step is less than 15 ms. |