Font Size: a A A

Research And Application Of Imbalance Data Classification Based On SVM

Posted on:2018-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:M HongFull Text:PDF
GTID:2348330536974686Subject:Engineering / Computer Technology
Abstract/Summary:PDF Full Text Request
With the development of computers and information technology,lots of are made in the production and life every day.How to effectively find and explore the knowledge and laws of these data and classify and predict of them has become important research content of artificial intelligence,machine learning and other areas.SVM is a classification algorithm based on statistical learning theory and structural risk minimization principle.Its decision function is determined by only a few support vectors,adding or deleting some non-support vector samples does not affect the performance of the model.Compared with the traditional classification algorithm,SVM has a strong generalization ability,is not easy to fall into the local minimum and suitable for classification of high-dimensional and small sample data,can effectively solve the classification problem of balanced data set.However,when the data distribution of two classes are imbalanced,SVM shows the following deficiencies: First,because SVM is the method that based on the soft-interval maximization,so in the border area classification super-plane will be a few categories of tilt.The second is that the imbalanced ratio of the support vector will also result in more negative support vectors around the test sample.This paper focus on the difficulties and shortcomings of imbalanced data classification by SVM method,intensive the study of data level and algorithm level,and apply the imbalanced data classification algorithm to micro-blogging emotion classification problem.The main work includes the following three aspects:1)At data level,a resampling method of BADASYN algorithm based on class boundary sample adaptive synthesis is proposed.The algorithm first finds the minority samples in boundary region of two classes,then adaptive synthesize some minority samples using the ADASYN method and adds the newly synthesized samples to the training set.The support vector of the SVM model is mainly composed of newly synthesized samples,and finally the separation hyperplane is close to the multi-class samples.2)At algorithm level,a selective ensemble learning method NCAB-SVM based on negative correlation learning and Ada Boost SVM algorithm is proposed.The negative correlation learning theory is integrated into the Ada Boost SVM training process,aimed to train a group of strong SVM classifier to form a stronger ensemble classification system.The algorithm uses the negative correlation learning theory to compute the correlation between the classifiers,and adapts the weight of each classifier according to their correlation value,and then obtains the weighted decision classifier.3)Focusing on the imbalanced sample distribution and feature distribution in micro-blogging emotion classification problem,the imbalance data classification algorithm based on SVM that combination with data level and algorithm level is used to classify the emotional polarity of micro-blog.First,use the BABASYN algorithm to synthesize some minority class samples and adjust the imbalance ratio;Then,use the NCAB-SVM algorithm to train a series of SVM base classifiers and selectively ensemble them to obtain the decision system;Last,crawling different areas of the Sina micro-blog data sets and the published evaluation data sets to test the performance of the method.
Keywords/Search Tags:SVM, Resampling, Imbalanced data classification, Negative correlation learning, Selective ensemble learning
PDF Full Text Request
Related items