| With the continuous improvement of train speed in China,passengers are demanding higher train punctuality.However,according to the actual operation,trains will deviate from the trajectory of the planned diagram due to some interference factors.At present,dispatchers estimate the train delay time based on their own dispatching experience and adjust the train diagrams online.If the train arrival delay time can be predicted in advance,it can provide more accurate information to dispatchers and further improve the scheduling optimization effect.In this paper,based on the actual delay cases,the rule of train delays in the case of disturbance was studied,and the prediction methods of train arrival time was designed.The main contents of the paper are as follows.(1)Based on the description of the train delay cases,TF-IDF text mining method was selected for keyword extraction.The factors affecting the train delays were analyzed in combination with the actual operation of the train.According to the actual survey and research status,the features of train delays were sorted out.By collecting train schedules and line information,and combining the train delay cases,the features were numerically processed to construct the train delay data set.(2)The correlation and redundancy of the features were analyzed.For weak correlation and redundant features,an improved feature selection algorithm based on Max-Relevance and Min-Redundancy(m RMR)was proposed.The maximal information coefficient(MIC)was used to replace the original mutual information as the evaluation criterion for the correlation of variables.The evaluation criterion of the fusion of MIC and Spearman coefficient was designed to improve the shortcomings that mutual information was insensitive to discrete values and the measurement criterion was single.The effectiveness of the MS-m RMR algorithm on delay data set was proved by comparing with the prediction accuracy of the feature set selected by the original m RMR algorithm.(3)Based on the delay data set,the random forest(RF)algorithm,the gradient boosting decision tree(GBDT)algorithm and the extreme gradient boosting tree(XGBoost)algorithm were selected to establish the regression prediction model of delay time.The decision coefficient R~2 was used as the weight to improve the random forest algorithm.The accuracy of weighted random forest(w RF)and RF was compared on the delay data set.Particle swarm optimization(PSO)and grid search algorithm(GS)were used to optimize the model hyperparameters.The optimal algorithm combination was obtained by comparing the accuracy of the three optimized prediction models.(4)A train delay prediction system based on Django was developed.The system used the optimal algorithm combination as the background prediction engine module to implement the prediction model researched in this paper.There are 35 figures,26 tables,and 78 references in this paper. |