Font Size: a A A

Research On Unbalanced Data Mining Algorithm Based On Cost-sensitive Learning

Posted on:2022-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:C DongFull Text:PDF
GTID:2518306749963479Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Effectively solving customer churn and improving corporate profitability are issues that every company will be concerned about.Customers are the main source of profit for the company.Effectively controlling the loss of customers is of great significance to the existence and development of the company.In recent years,through relevant research and analysis,it is found that more and more companies have begun to pay attention to the problem of customer churn.Effectiveness and applicability to customer churn.The customer churn warning problem is also a common binary classification task in data mining,and because the proportion of churn customers is small,this problem is also an imbalanced learning problem.In the classification tasks of most real datasets,the number of data categories is unequal,and the number of categories is often quite different.While traditional machine learning models are good at training data that is balanced between classes,the model is not effective for unbalanced data,which leads to research on unbalanced learning.Based on the original Adaboost,this paper selects the decision tree as the base classifier,and combines the main idea and connotation of cost-sensitive learning,using the cost loss function and the optimal base classifier to construct the weight update strategy and weighting coefficient solution in the original algorithm The formula makes it less expensive and pays more attention to minority samples,which can better identify churn customers.On four unbalanced public datasets,the CCSADA algorithm and the original Adaboost algorithm are used respectively,and the effectiveness of the CCSADA algorithm is verified by comparing the numerical results of the evaluation indicators.On the data set of telecom broadband customer churn,this paper uses the real customer data of a telecom company in Yunnan Province for verification and analysis.First,data cleaning and feature selection are carried out,and a decision tree,traditional Adaboost algorithm and CCSADA proposed in this paper are established successively.algorithm.By comparing the numerical results of different evaluation indicators,it is further verified that the performance of the CCSADA algorithm is better than the other two algorithms.From this,it can be concluded that the CCSADA algorithm in this paper has a certain influence on improving the classification effect of imbalanced data sets.At the same time,it is also confirmed that this method solves the problem of poor customer churn prediction effect caused by data imbalance to a certain extent.
Keywords/Search Tags:unbalanced data, cost sensitive, adaboost, broadband customer loss
PDF Full Text Request
Related items