In recent years,the development and expansion of the global air transport industry has led to a rapid increase in the number of flights.However,the frequent occurrence of delays seriously hinders the normal operation of flights.The factors affecting flight delays include many aspects.In order to effectively predict flight delays,we must excavate the hidden correlation between the factors related to flight delays and choose appropriate methods to predict flight delays.Therefore,for the purpose of studying flight delay,this paper uses machine learning method to model it.In order to verify the rationality of prediction model that can be used as reference for decision making of relevant departments,relevant data is used for experimental analysis.Firstly,it introduces the background of the development of global civil aviation,expounds the basic theoretical concepts of flight delay,and analyzes the main factors affecting flight delay and their degrees.Based on the current academic research at home and abroad and the practical application of production demand analysis,we decide to start from the past operation data and adopt machine learning method for delay prediction analysis,in order to meet the higher accuracy expectations.Then,data preprocessing is carried out,mainly to supplement the missing value and add the new index.The relationship between the main variables in the data set and flight delay is analyzed descriptively and displayed visually.By combining the correlation coefficient,recursive feature elimination and embedded feature selection based on random forest and XGBoost model,the mixed feature selection method was obtained.Finally,in the new data set after feature selection processing,the flight prediction model is constructed using algorithms including support vector machines,naive Bayes,random forest,XGBoost,LightGBM,and LSTM,respectively,and the fusion model is constructed using the Stacking algorithm.The flight delay prediction model is used to classify the situation of flight delay,and combined with a variety of evaluation indexes,the classification effect of different models is evaluated.Among many evaluation indexes,AUC value and F1 value are mainly referred to as comprehensive evaluation indexes.Besides,due to the serious loss caused by misjudgment of delayed flights as on-time flights,special attention is paid to recall rate.Among the seven methods in this paper,the method with the best overall performance is the forecasting model based on the Stacking fusion model.Among them,AUC value is 0.971,F1 value is 0.849,accuracy rate is 0.957,precision rate is 0.867,recall rate is 0.831,and the prediction result is good. |