| In the face of ever growing competition in the aviation industry,and efficient and accurate customer loyalty prediction model is conducive to improving corporate competitiveness.At present,there are severe classification imbalances and the high feature dimensions in aviation datasets,the research on airline customer value prediction at home and abroad is still at the research stage,which brings great challenges to loyalty prediction.In addition,in the realm of aviation,the unlabeled customer datasets are easy to obtain.,the large amounts of labeled customer datasets are expensive to obtain.Only a small amount of labeled customer datasets are used to build a loyalty prediction model with poor performance.Therefore,this paper starts from perspective of the actual needs of enterprises,and establishes an efficient and objective loyalty prediction model suitable for airlines,which assist enterprise managers to accurately predict customer loyalty.The main content of this paper includes:1)In order to better solve the problems of severe classification imbalance and the high feature dimensions in aviation customer datasets.This paper optimizes sample subsets based on adaptive particle swarm optimization to obtain the balanced datasets,which can improve minority classification performance.And then the obtained balanced datasets is extracted main features using the convolutional neural network.An automatically derived feature vector is then used as the input for the random forest algorithm for the construction of customer loyalty prediction models.The experimental results show that the proposed model has achieved good prediction performance in aviation customer loyalty prediction.2)This paper proposes a framework for customer loyalty prediction based on random forest semi-supervised learning,which can better combine with the high cost of manual labeling of aviation industry customer datasets in actual scenes.The proposed model adopts the Lasso method to calculate the feature weight for random subspace’s feature selection,which can generate multiple decision tree classifiers with large differences,and also can reduce the impact of unimportant features on the prediction model.Furthermore,in response to the problem that too many classifiers resulting in less differences and time complexity,this paper introduces group idea into the random forest algorithm.The other two auxiliary decision tree groups learn the unlabeled samples with the same prediction result and high confidence,which are added to the main decision tree group proportionately.Through this way can reduce the impact of wrong samples.The experiment proves that the performance of the model is better than the supervised learning model. |