| With the rapid development of Chinese railway,the shortage of railway transportation capacity has been greatly eased in recent years,but there are still local tensions.One significant characteristic of railway transportation is that trains have many intermediate stops,which leads to competition among passengers need of different origin and destination in terms of seat resources.Railway operators currently have strategies to deal with these problems,including ticket allotment before starting the sales process and quota adjustment after that.Since the allotment plan is difficult to accurately match the real passenger flow,the ticket sales organization strategy needs to be adjusted according to the actual ticket sales during the ticket sales process.Currently this work is still done manually by managers,it is difficult to meet the current operational management needs.In this context,this study aims to construct machine learning algorithms to predict trains with improper allotments for the managers to take corresponding actions as early as possible.This paper first introduces current ticket sales organization strategy,and gives a detailed discussion to the business of tickets sales forecasting.This article first sorts out the current ticketing organization process according to the three stages: before the presale period,during the pre-sale period,and after the pre-sale period,and gives a detailed discussion to the early warning problems in the railway ticket sales process.Secondly,we analyze the passenger purchase trends contained in historical operation data,and extract feature attributes and sales results from the real historical operation data.Then seven typical machine learning methods including logistic regression,support vector machine,K-nearest neighbor,decision tree,neural network,random forest,and Adaboost were used to construct binary classification models and a multi-classification models,and the best models are chosen to establish combined models based on voting method.Given the imbalanced nature of the original dataset,five resampling methods including random undersampling,SMOTE,SMOTEENN,Borderline-SMOTE,and ADASYN are also applied to optimize the training dataset.At last,utilizing real data from Beijing-Shanghai high-speed railway,multiple sets of controlled experiments were used to verify the performance of seven mainstream machine learning models,and the effects of the models under different conditions were analyzed.The results suggest that the best resampling method in railway ticket sales is SMOTEENN.In the dataset of this article,random forest outperforms other models,and the accuracy rate can achieve more than 95% under various conditions. |