With the accelerating urbanization,various complex buildings are rising up,and a large number of people are intertwined in them,public safety has become a key concern nowadays.In the event of crowding and trampling accidents,if a perfect emergency evacuation plan is not prepared in advance,it will have serious consequences.Due to the high cost of field evacuation drills,the use of computer simulation technology to study crowd evacuation simulation has received more and more attention from scholars,which can analyze the macroscopic crowd phenomenon and microscopic pedestrian individual decision making by simulating pedestrian behavior.This is important for formulating a perfect emergency plan and designing a reasonable layout of building facilities in advance.The conventional crowd simulation method often suffers from the problem of short-sightedness.It means that pedestrians can only choose the best action at the current moment without a certain degree of foresight as real humans do,which is extremely likely to lead to unreasonable movement behaviors and affect the accuracy of the simulation.Therefore,the results are not reliable as a reference for making emergency plans and discussing the rationality of facility layout.The excellent performance of deep reinforcement learning on sequential decision making has brought new ideas to the field of crowd simulation.Some scholars have combined it with some microscopic models such as Optimal Reciprocal Velocity Obstacle(ORCA)to demonstrate that the method can effectively improve the decision-making process of pedestrians.However,there are common problems in these studies such as high training time cost,large computational resource consumption,and difficulty in jumping out of the local optimal dilemma.Meanwhile,there is still a lack of research on the simulation of crowds in realistic scenarios and the discussion of architectural space layout.To address the above problems,this study proposes a hierarchical crowd simulation model based on deep reinforcement learning and expert trajectory guidance.Firstly,a decision-making method for pedestrians based on the Dueling Double Deep Q Network(D3QN)algorithm is proposed as the upper-level path guidance,with the state space designed as a continuous grayscale image centered on a single agent as input,and the action space discretized and the sparse reward densified,in accordance with the characteristics of pedestrian simulation tasks.Secondly,a general pedestrian simulation framework named D3QN-ORCA is established,combining with the ORCA algorithm used in the lower level.For extreme scenarios that are prone to local optima,the study proposes the expert-guided D3QN-ORCA method(EGD3QN-ORCA),which introduces the concept of expert trajectory imitation degree in the reward function,without the need for a large prior sample set or a complex expert example database,only by designing a simple expert trajectory guidance path.It can greatly improve the exploration efficiency of pedestrians.Finally,a distributed experience collection framework is introduced,creating multiple independent environments for parallel data sampling to further reduce the model training time.To validate the effectiveness of the proposed method,this study designed several experimental scenarios with varying levels of complexity and compared them with traditional agent-based model and deep reinforcement learning method.The experimental results showed that the D3QN-ORCA model proposed in this study can output more reasonable pedestrian movement patterns,adjust behavior strategies earlier for congestion situations,and generate path results that are closer to reality,thereby optimizing simulation performance.After incorporating expert trajectory guidance,the EGD3QN-ORCA achieved a 56%-64% reduction in the time required to explore the global maximum cumulative reward compared to the D3QN-ORCA,greatly improving the model convergence speed.By providing macroscopic guidance while preserving pedestrian freedom of exploration,this method helps pedestrians to quickly overcome local obstacles,thus optimizing training efficiency.Finally,this paper models a subway station in a real scenario,differentiates the physical attributes of pedestrians,realizes a heterogeneous crowd evacuation simulation,investigates the effects of the number and width of gate lanes and the location of accessible elevators in the subway station.It quantitatively evaluates the rationality of the layout structure of key facilities in the subway station under evacuation,and further verifies the practicality of the proposed method.The experimental results prove that the existing gate layout structure can effectively divert the crowd when the number of pedestrians is small or scattered,but when the crowd is dense,the width of the gate lanes will become an important factor limiting the efficiency of pedestrian passage.Therefore,the width of individual gate lanes should be appropriately widened.Meanwhile,the location of barrier-free elevators will affect the path planning and exit selection of pedestrians,which can play a role in crowd diversion.Considering the difference in the capacity of each gate,the location of elevator design is also an important factor affecting the overall evacuation efficiency. |