Font Size: a A A

Research And Application Of Data Mining Classification Algorithm

Posted on:2018-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:M M LeFull Text:PDF
GTID:2348330512489006Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology,people's activities have produced massive amounts of data.Data mining is an important means of discovering valuable information from massive data,and has been widely used in real life.Data mining classification algorithm is the most critical technology in data mining,which can solve the most classified problems in life.So it attracts extensive attention of academia and industry,and plays an increasingly important role in politics,economy,transportation and life.The main work of this paper is to study the data mining classification algorithm and experiment on the actual data set,and use the ensemble learning algorithm to solve the problem of life insurance risk rating.Firstly,this paper introduces the background and significance of the research,and the research status of data mining technology both at home and abroad,summarizes the relevant theories of data mining.It mainly analyzes the data preprocessing,feature engineering,classification algorithm performance evaluation,class imbalance problems and multi-classification issues.Secondly,this paper studies the common classification algorithms in data mining.The classical classification algorithm focuses on the Naive Bayesian algorithm,the logistic regression algorithm,the K-nearest neighbor algorithm,the support vector machine algorithm and the decision tree algorithm.The basic principles of each algorithm,the steps of algorithm implementation,and the advantages and disadvantages of the algorithm are given.The part of ensemble learning algorithm introduces two kinds of integrated learning methods,Bagging method and Boosting method.The representative algorithm of Bagging method is random forest algorithm,and the representative algorithms of Boosting method are GBDT algorithm and xgboost algorithm.The algorithms are implemented on three different data sets,through the comparison and analysis we can get that the ensemble learning algorithm has better performance than the classical classification algorithm with the expansion of the data set.Finally,a life insurance risk rating model based on the ensemble learning classification algorithm is established,and the ensemble learning classification algorithm is applied to solve the practical problem of life insurance risk rating assessment.Through the pretreatment and feature engineering of life insurance data,the data set needed to establish the algorithm model is obtained.Divide the data set as a training set and a test set and give the rating indicator Kappa on the test set.Training the ensemble learning algorithm model on the training set,including the random forest model for feature screening and the xgboost model used to establish the life insurance risk rating model.In order to improve the Kappa value,the model fusion technology is used and the model output result is optimized.At last,an optimal comprehensive model of life insurance rating forecast is obtained,which effectively solves the practical problem of life insurance risk rating forecast.
Keywords/Search Tags:Data Mining, classification algorithm, ensemble learning, risk rating assessment
PDF Full Text Request
Related items