| The operation of high-speed railway(HSR)involves collaboration between multiple departments.The operating environment is dynamic,in which the internal and external disturbances are unavoidable.In addition,the HSR of China has the characteristics of high train running density and small tracking interval.Once a train is disturbed,the delay will easily spread horizontally and vertically in the network,which will further deepen the impact of delay.As a result,accurately grasping the distribution pattern of train delay,becoming acquainted with the effect characteristics of the delay sources,and realizing the accurate prediction of train delay can provide sufficient time and decision-making assistance for dispatchers’ real-time scheduling,and also can reduce the negative impact brought by delayed transmission as far as possible and further improve the reliability and punctuality of train operation.The main research contents are as follows:(1)The overview of the delay theories of high-speed railway is provided.Firstly,the paper summarizes the causes of high speed railway train delay.Secondly,the types of delay are classified according to the nature of delay.The process of train delay transmission is analyzed in detail from two aspects of delay transmission mechanism and absorption mechanism.Finally,the fundamental theory of using machine learning to forecast train delays is explained,laying the theoretical groundwork for the research that follows in this paper.(2)The data of train operation and the realistic data of abnormal events are preprocessed.Based on the actual train operation data,the temporal and spatial distribution of the arrival and departure delays of high-speed trains is statistically analyzed.Based on the realistic data of abnormal events,the optimal distribution functions of the duration of various abnormal events,the number of delayed trains and the primary delayed time are fitted respectively,and the influence distribution model of abnormal events on the research line is established from a macroscopic perspective.The results revealed that the Burr distribution and Log-normal distribution have strong fitting effect on the distribution of duration of various abnormal events,the number of delayed trains and the primary delayed time.(3)The data-driven method for predicting the arrival delay of high-speed trains is proposed.Firstly,the characteristic factors that may affect the train arrival delay are analyzed and extracted from the perspectives of time,space and infrastructure.Secondly,a feature extraction method based on Random Forest algorithm is designed to construct the optimal feature collection.Thirdly,a high-speed train arrival delay prediction model based on BO-XGBoost algorithm is established,in which XGBoost algorithm is used to learn the complex relationship between input features and train arrival delay,and Bayesian Optimization(BO)algorithm is used to optimize the hyperparameters of XGBoost algorithm.Finally,the actual data of two railway lines,Wu-Guangzhou and Xiamen-Shenzhen,are used for case study.The results indicate that the forecasting accuracy of BO-XGBoost model is superior than the current benchmark models,such as ANN,DELM and RF.(4)Considering the characteristics of abnormal events,an abnormal event-train arrival delay prediction model based on multi-model fusion Stacking model is proposed based on the BO-XGBoost prediction model.The model is trained and evaluated by using the arrival delay data generated by the five kind of abnormal events that occur most frequently in the upward direction of Wuhan-Guangzhou high-speed railway,which proves the effectiveness of the proposed prediction model.Especially,the prediction accuracy of the major delay events over 20 minutes is greatly improved.Finally,through the empirical analysis,the BO-XGBoost model and the Stacking integrated model are respectively applicable scenarios. |