Font Size: a A A

Research On Classification Of Imbalanced Telecom Customer Data

Posted on:2018-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:N N GuoFull Text:PDF
GTID:2348330533465849Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In the real telecom customer dataset, the proportion of lost customers is far lower than the non-loss customers, so the customer churn prediction is usually regarded as the problem of imbalanced data classification. The conventional classification algorithms take the whole prediction precision rate as learning target, ignoring the effects of the skewed data distribution on the classification result and the discrimination ability of feature to class label, which makes the classifiers over-learn the non-loss customers and identify poorly to the losing customers.Therefore, it is important to explore efficient imbalance data classification algorithm in order to solve the problem of data imbalance faced by churn prediction. The main research contents are as follows:1. It is difficult for conventional classification algorithms to identify the lost customers in the context of imbalanced telecom customer dataset, thus, an improved dissimilarity-based imbalanced data classification algorithm (IDBC) was given. And this algorithm applied dissimilarity representation theory, which redefines the description of sample object by using the dependency among sample objects, improved the prototype selection strategy on the basis of the dissimilarity representation method, and increased the feature selection.2. In the process of building classification model, the feature selection was firstly used to eliminate the interference of the redundant and unrelated customer attributes to the prototype selection. Then the improved sample subset optimization tecnology (ISSO) was adopted to select the most valuable prototype set from the wholle dataset, and new feature space was finally constructed via dissimilarities between samples from train set and prototype set, and samples from test set and prototype set. At the same time, dissimilarity-based datasets mapped into corresponding feature space were learnt with conventional classification algorithms'3. Six ordinary imbalanced datasets from UCI database and two telecom customer datasets were selected to verify the effectiveness of this algorithm. In addition, the influence of the feature selection method, the prototype selection strategy, the number of prototype object and the dissimilarity measure method on IDBC algorithm was analyzed.The experimental results indicate that: (1) Using improved sample subset optimization technology (ISSO) can eliminate the uncertainty caused by the random selection; (2) IDBC algorithm is not affected by the skewed category distribution, and the discriminative ability of this algorithm to raw class outperforms existing state-of-the-art approaches; (3) It is reasonable for IDBC algorithm to select mRMR feature selection method, ISSO prototype selection strategy, twenty prototype objects and standard Euclidean distance measure to solve the problem of imbalanced telecom customer classification.
Keywords/Search Tags:Customers chum prediction, Imbalanced data classification, Sample Subset Optimization (SSO), Prototype selection, Dissimilarity transformation
PDF Full Text Request
Related items