| As time goes by,on one hand,big data technology,in continuous progress,has made great value in many fields.On the other hand,the increasing demand of air transport has led to the problem of flight delay,which is becoming more and more prominent.Combining these two trends,it is a valuable work to make a real-time and accurate prediction of flight delays with the big data method.This thesis mainly studies the prediction of flights delay based on big data methods.Firstly,the criterion and factors of flight delay are analyzed in detail and the current situation and challenges of the flight delay prediction by government and the public are discussed separately.Secondly,the infrastructure and processing flow of big data are introduced.Thirdly,it brings forward two important models,data model and prediction model,which ought to be established in big data processing flow.For the former,the thesis proposes a data model on the situation of flight delay forecasting.For the latter,it proposes a set of incremental training method from individual learning to ensemble learning.In the case study of model training,the performance indexes of the training models were collected and compared among Decision Tree and two ensemble-learning models.After several times of training,it is found that ensemble-learning models have better generalization performance,but takes longer time than the individual learning model.Gradient Boosted Trees which belongs to serializable category of ensemble-learning is trained with increase of iteration times but low performance promoted.Therefore,it is recommended to use Random Forest model,which belongs to paralleled category of ensemblelearning to predict the flight delay for better performance.Finally,the concept of streaming calculation is extended to make the prediction more quick and accurate by receiving real-time data for real-time calculation,which could be a further research direction for big data technology in flight delay prediction. |