| In recent years,due to the increase in the number of airlines and the rapid development of high-speed rail technology,the civil aviation industry is facing the dual pressure of competition in the same industry and the competition of substitutes.The serious loss of customers has become a major problem perplexing the company.Establish an efficient passenger loss prediction model,lock in and prevent the loss of high-value passengers,and minimize the loss.It is particularly critical for aviation enterprises.This paper takes the airline passenger data as the research object,and constructs the value segmentation model of airline passengers based on the practicability of the loss prediction model,and establishes an efficient loss prediction model based on the passenger value segmentation.The specific work is summarized as follows:First of all,combined with the business logic of airlines,the value segmentation model of airline passengers is established,and the passengers are divided into three groups: "low value passengers","medium value passengers" and "high value passengers" by clustering with Kmeans + + algorithm.On the analysis of the loss of passengers,the loss rule is established by constructing the loss rate function.The loss status of passengers can be divided into three categories: "not lost","quasi lost" and "already lost".By improving the evaluation index of the two classification model,this paper puts forward the model evaluation index when we study the three classification problem.Because there are too many attributes in the data set,in order to improve the efficiency and accuracy of the model,In this paper,the importance of variables derived from the importance of random forest dimensions is used to select the characteristics of the model.Then,k-nearest neighbor,naive Bayes,support vector machine,random forest,gbdt and xgboost passenger loss prediction models are established respectively.Compared with the evaluation index of multi classification problem,the best prediction model is selected for further optimization.The analysis results show that XGBoost model is obviously superior to other classifiers,so XGBoost is chosen as the baseline for follow-up research.Secondly,there is a widespread problem of category imbalance in the field of passenger loss prediction.The default classification error cost of general classification algorithms is the same,and the loss caused by classifying a lost user as a non-lost user is much greater than that caused by predicting a non-lost user as a lost user.Therefore,this paper introduces a costsensitive learning algorithm,Different penalties will be set for different classification errors,and XGBoost prediction model will be improved by modifying the loss function of XGBoost algorithm,so as to minimize the loss caused by wrong classification.Empirical results show that XGBoost model based on cost-sensitive learning algorithm has better performance in reducing the first class of classification errors.Finally,in order to further optimize the model and improve the prediction accuracy and operation efficiency.On the basis of the previous research,the XGBoost model is further optimized: on the one hand,it is optimized for feature engineering,including single feature screening: XGBoost incremental training for different feature combinations,and selecting the best feature combination;And detecting abnormal values by adopting a K-means clustering method;Try to use high-pass,low-pass and band-pass filtering to improve the prediction effect of the model.On the other hand,aiming at the superparameters of XGBoost model,the optimization algorithm such as particle swarm optimization is used to analyze the parameters with multi-objective optimization,and the parameter combination with the best prediction effect is obtained.Finally,it is found that through further model optimization,The accuracy and prediction accuracy of XGBoost model are improved.Based on the above analysis,this paper constructs an efficient prediction model of frequent flyer loss in civil aviation.According to the model and the result of grouping passengers’ value,enterprises can get the loss of passengers with different values and formulate differentiated retention strategies,which is very practical. |