Font Size: a A A

Research On The Prediction Method For Imbalance Data Set

Posted on:2016-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:S B LiFull Text:PDF
GTID:2298330467488408Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Imbalance data set is a difficult problem in data mining, and has very highpractical value. Therefore, get the wide attention of scholars in recent years, manyresearch results were published in the journal, the high level meeting. It is a trueobservation data form generally exists in many fields, real and objective descriptionof the nature of some things, a small part is of concern, but this part of the data isoften hidden by a large amount of data, resulting in the classification of difficultproblem. The classification of imbalance data set is a difficult problem in the field ofdata mining, the problem processing commonly used classification strategy from thetraditional classification problem is not very good for this problem, has aroused greatattention of countries experts and scholars all over the world. Being an important partof customer relationship management, Customer churning management became anindispensable part in modern enterprise management. In recent years, transmissiontechnology and Internet technology, Distributed DataBase Management System(DDBMS) was widely used in enterprise. The ample data of enterprise provided thenecessary conditions to the customer churning forecast based on data mining.Customer churning forecast system, being an important part of enterprisemanagement analysis system, sets up the customer churning forecasting models andfinds the potentially lose customers, in order to take measures and save and reducethe occurrence of customer churn. Therefore, the study on the customer churnforecast has become a hot research topic for its important significance on theimprovement of enterprise competitiveness.This thesis firstly introduce the conception of imbalanced data set and theprogress of imbalance data classification problem that is being studied by experts andscholars in the world, and it explains the reasons why imbalance dataclassification problem is so difficult to work out, the treatments we often adopt aboutthis problem, and the evaluating metric of classification performance. The dissertation analyzes the application of data mining technology in customer churning,the characteristics of enterprise customer churning data and the great influence ofnetwork society to modern enterprise according to the present situation of customerchurning research development. It also puts forward the “logistic regressionprediction model based on Clustering Stratified sampling” and “customer churningwarning methods based on the analysis of network public opinion”. And the latteralso contains the parameter estimation based on logistic regression of Stratifiedsampling and bias compensation method of parameter estimation.
Keywords/Search Tags:imbalanced data, customer churn prediction, stratified sampling, logistic regression
PDF Full Text Request
Related items