Font Size: a A A

Research On SVM Classification Of Unbalanced Data And Its Application In Identify Poor Students In Colleges And Universities

Posted on:2020-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:R H ChenFull Text:PDF
GTID:2428330620455435Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the current era of Internet information,people generate a large amount of available and valuable data every day in real life.How to find intrinsic and extrinsic correlation and extract useful information from these data is the hotspot of current data mining and artificial intelligence research.The classification of data is one of the most basic and important researches in the field of machine learning.It is an indispensable part of practical applications,but the actual data distribution is often unbalanced.Many algorithms applied in the classification of unbalanced data is not satisfactory.Support Vector Machine(SVM)algorithm is an important machine learning classification algorithm based on statistical theory and structural risk minimization theory.The optimal classification hyperplane it seeks is determined by the support vector,and the number of support vectors is also proportional to the number of classes.This leads to the problem that the SVM classifies poorly to less samples in the case of unbalanced sample size.For the problem of SVM-based unbalanced data classification,the research work of this paper mainly includes the following aspects:1.The basic principles of machine learning algorithms based on statistical theory are discussed,including artificial neural networks based on empirical risk minimization and SVM algorithms based on structural risk minimization.Focus on analyzing SVM algorithms and summarize the research progress of SVM.2.This paper analyzes the problematic nature of unbalanced data classification and summarizes the various methods of SVM algorithm in solving the problem of unbalanced data classification.3.The pre-processing and feature analysis of the extracted college impoverished students' consumption data were carried out in detail.Based on the artificial neural network and decision tree,the grade recognition model of college students is established.In order to analyze the characteristics of SVM algorithm in small data sets,a small data set is made as a comparative experimental data set of SVM algorithm.4.A impoverished students grade identification model based on SVM algorithm was established.The unbalanced data classification problem is mainly dealt with from the data level and algorithm level.Based on the data sampling processing,the NCRBoost-SVM algorithm combined with data sampling and integrated learning is proposed.In the size data set,multiple models are compared and analyzed.The experimental results show that SVM performs well in small data sets.After undersampling processing can improve the SVM classification effect,and the proposed algorithm may get better results.
Keywords/Search Tags:Data mining, SVM, unbalanced data classification, data sampling, ensemble learning, impoverished student grade identification
PDF Full Text Request
Related items