Font Size: a A A

Research And Application Of Imbalanced Dataset Classification Prediction Algorithm

Posted on:2018-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ZhangFull Text:PDF
GTID:2348330518496122Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of mobile Internet, distributed storage and parallel processing, the data shows an explosive growth in all fields of life. How to take the advantage of data mining technology into daily production and intelligent operation has become the hot spot in recent years. The application of classification prediction algorithm is very extensive in data mining. Since most of the data we encountered is imbalanced, the study of classification prediction algorithm for imbalanced dataset is of great significance. Focusing on the research and application of imbalanced data classification prediction algorithm, the main contribution is completed as follows:Firstly, an improved AdaBoost algorithm based on SMOTE is proposed. In the proprosed algorithm, the SMOTE algorithm is executed based on the imbalanced dataset to reduce imbalance ratio of many types of samples in the dataset. Then bootstrap sample method is executed based on the processed data. The base classifier is generated based on the training subset, and the weight of the base classifier is calculated. In order to improve the performance of AdaBoost algorithm, the sample weights are updated with the SMOTE algorithm. Experiments based on actual data verify the effectiveness and generalization ability of the proposed algorithm.Secondly, the criteria of evaluating classification algorithm on the imbalanced dataset are investigated. Based on the characteristics of imbalanced dataset, the profit model is proposed to evaluate the effect of customer churn considering the misclassification cost and customer retention model. Experiments based on actual data are carried out and verified for the effectiveness of the proposed profit model.Finally, from the view of the application, the improved classification prediction algorithm and profit function model are used in the customer churn data and business marketing data to reveal the potential related information among the data. The experimental results based on actual application data prove the effectiveness of the algorithm and evaluation criterion. At the same time, an effective solution is presented to the enterprise customer management to reduce the company's operating costs and improve operation efficiency.
Keywords/Search Tags:imbalanced dataset, classification prediction algorithm, SMOTE, AdaBoost, profit model
PDF Full Text Request
Related items