Font Size: a A A

Modeling And Application Of Support Vector Machine Based On Improved SMOTE

Posted on:2018-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:L QinFull Text:PDF
GTID:2348330536487836Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the social and economic development, individual credit information has also been gradually upgraded to more and more important position. With the continuous updating of data mining technology, machine learning based on large data has gradually replaced the artificial selection method, which plays an increasingly important role in the credit information industry. Support Vector Machine (SVM) is a classical classification method in machine learning algorithm. It has the advantages of good classification performance and fast training speed, especially in non-linear classification scenario. Support Vector Machine (SVM) has been widely used in the fields of industrial production, intrusion detection, medical evaluation, user recommendation, management evaluation, decision system, financial information collection and bioscience, based on rigorous mathematical deduction and solid statistical method.However, with the development of technical level, the cost of data acquisition and storage declines rapidly, and the complexity of data in classification problems is increasing along with the rapid increase of data volume. As the data dimension is increasing, the data balance is more like unilateral tilt. Change has brought more and more challenges to the classification problem. For the support vector machine, these problems seriously affect the classification of classical classifier in a scene performance. To deal with the problem of data volume increase and the complexity of practical scene, it is necessary to fully consider the impact of non-equilibrium data and index complexity on the classification results per the inherent characteristics of SVM. It is possible to improve the classical support vector machine, and further enhance its application value under the premise of supporting the strict theoretical foundation of SVM.In this paper, the theory and properties of classical support vector machine are systematically studied. To deal with the problem of data unbalance in SVM, the modeling and implementation of the solution are discussed respectively, and the adaptive properties are given. And improved the support vector machine (SVM) algorithm with good resistance. The credit risk evaluation model of the micro-credit company is used as the practical application case. After testing, the method improves the classification accuracy of the potential default customer. The main contents of this paper are as follows:(1) To study the modeling and application of SVM classifier in fuzzy cases, and to study SVM classifier based on interval number. Aiming at the case of interval number in sample, a sampling method based on hypercube sampling is proposed. An Algorithm for Sampling Interval Number Samples by Binary Tree.(2) This paper analyzes the drawbacks of the traditional SMOTE algorithm in dealing with non-equilibrium data without considering the meaning of the sample itself, and will operate on a small number of samples. Based on SMOTE interpolation for a small number of samples, the improved oversampling method is used to improve the distribution of new synthetic samples by using the classification characteristics of interval number SVM. Finally, the complete model and algorithm flow of improved SMOTE support vector machine under unbalanced data are given.(3) This paper analyzes the influence of setting key indexes and related parameters on classification results in the process of using improved SMOTE, and proposes the SOMTE support vector machine algorithm based on information gain optimization. The SMOTE support vector machine based on hypercube vertex sampling of information gain is firstly established, and the parameters of the improved SMOTE-SVM model are automatically optimized through the optimization algorithm. Then the rationality of the algorithm parameter setting is improved and the classification performance is improved. The detailed flow of the combinatorial algorithm is given.(4) It studies the practical problems faced by micro-credit companies in credit risk assessment,analyzes their disadvantages in customer credit evaluation, constructs the credit risk evaluation index system based on the actual operation of micro-credit companies, The support vector machine (SVM)algorithm is applied to practical problems, and compared with other classical classification algorithms,the key performance indicators are compared and analyzed, and the distribution of key indexes of customer default is analyzed. Finally, the typical characteristics of the two types of users' portrait is performed.
Keywords/Search Tags:SMOTE, support vector machine, unbalanced data, credit risk
PDF Full Text Request
Related items