Font Size: a A A

Neural Network Modeling Of Imbalance Missing Data And Its Application

Posted on:2019-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:X M ZhouFull Text:PDF
GTID:2428330596951050Subject:Engineering
Abstract/Summary:PDF Full Text Request
The classification of imbalance missing data sets has always been the focus of data analysis.The traditional classification methods tend to keep the classification accuracy of the minority categories at a low level when classifying the imbalance missing data sets,rather than preprocessing the imbalance missing data and to optimize the classification method can well solve the problem of low accuracy of minority classification under the imbalance missing data.The default risk management of clients in online loan companies involves the classification of imbalance missing data.It is always a priority in this field to predict,prevent and control the default risk faced by online loan companies.This paper focuses on the neural network modeling of unbalanced data loss and its application as follows:(1)Aiming at the shortcomings of existing data filling methods,this paper proposes a kNN-DBSCAN filling method to fill in missing data.The existing data processing problems are usually filled by the mean or the nearest neighbor method.However,there are few methods to fill data from the perspective of data distribution at high latitudes.A new method based on density clustering and nearest neighbor method is proposed method,and through experiments to prove its effectiveness.(2)Aiming at the deficiency of over-sampling technique SMOTE of classical synthesis minority,a SMOTE method based on K-means improvement is proposed.The problem of classification around unbalanced datasets is mainly solved by preprocessing datasets and optimizing classification algorithms.The oversampling methods in existing data preprocessing are analyzed,and a SMOTE method based on K-means is proposed.This method prevents samples with certain characteristics from judging the occurrence of another type of sample operation.The experimental verification of UCI data sets proves the effectiveness of the method.(3)In view of the characteristics of neural network and XGBOOST classification model,a neural network classification model based on XGBOOST is proposed.In view of the fact that none of the classification algorithms can completely outperform other classification methods in terms of classification accuracy and stability.Based on the UCI dataset,the characteristics of XGBOOST model and ANN model are analyzed,and a neural network classification model based on XGBOOST is proposed.Experiments show that the combination model is superior to single model in accuracy and stability.(4)Taking the customer risk prediction of rong360 internet loan company as traction,the paper studies the problem of customer credit risk prediction under the imbalance deletion problem,and predicts the customer risk based on the algorithm proposed in this paper,which improves the capability of identifying customer default.This paper studies the practical problems faced by rong360 internet loan company in customer risk assessment,analyzes its disadvantages in customer credit evaluation,constructs credit risk assessment index system based on the data collected by rong360 internet loan companies,The pretreatment and classification combination model are applied to practical problems.Based on the key indicators,the probability of customer default is analyzed.Based on the experimental results,the user portraits of two typical users are presented.
Keywords/Search Tags:missing data, k-nearest neighbor, DBSCAN, imbalance data, SMOTE, K-means, artificial neural network, XGBOOST, credit risk
PDF Full Text Request
Related items