Font Size: a A A

Classification And Application Of Ensemble Learning In Unbalanced Data

Posted on:2015-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:B B ZhouFull Text:PDF
GTID:2298330422482418Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Under the fierce competition of the telecommunications service, The traditionthree MOBILES (CMCC China Unicom China Telecom) all in thinking how to attractnew customer and maintain old customer. Among it maintaining old customers isespecially important. Researches show that attracting a new customer costs is tomaintaining an old customer5times. In the aspect of profits, an old customer is a newcustomer16times. Therefore, reduce customer run away is especially important to thetradition three MOBILES in the aspect of improving income and reducing cost. Onthe other hand, The phenomenon of customer run away is widespread in traditionthree MOBILES. In order to solve this problem, one of the three MOBILES establisha prediction model of customer run away, based on the result of prediction model putforward relevant maintain way.Prediction model of customer run away is a classification problem in data mining.In this paper, Using customer history consumption data of Guangdong province sometelecom operator. Then use decision tree algorithm and ensemble learning algorithmto data modeling. In this paper, based on recall rate and F-Measure to evaluation amodel, and comparing with prediction model of customer run away. Ensemblelearning algorithm in recall rate and F-Measure averaged5.6%and4.6%increase.Under normal circumstance, customer run away rate in about5%, that run awaycustomer accounted for95%, we call such data is unbalanced data. In view of theunbalanced data, this paper present an under-sampling method based on K-meansclustering. This method is improved from under-sampling method, the method isobvious on decision tree algorithm and Bagging algorithm. Therefore, this paper notonly has some practical value, at the same time dealing with unbalanced data alsohave some help.This article is integrated learning algorithm and practical problems.Establish a prediction model of customer run away has practical significance. Thismodel of telecom operators to maintain customer analysis and policy-making hasplayed a major supporting role. On the other hand, In view of the unbalanced data,this paper present an under-sampling method based on K-means clustering, The algorithm can improve the classification of unbalanced data have a role.
Keywords/Search Tags:unbalanced data, ensemble learning algorithm, decision tree algorithm, K-means clustering, customer run away, under-sampling method
PDF Full Text Request
Related items