Research On Support Vector Machine For Large Scale Imbalanced Data

Posted on:2017-01-04

Degree:Master

Type:Thesis

Country:China

Candidate:X N Dong

Full Text:PDF

GTID:2428330569499063

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Recently years,big data analysis technology has run into an explosive development in a short time.Data mining plays a positive role in promoting the development of academia and industry.Classification is of vital importance for data mining and support vector machine is an excellent classification algorithm.However,it gives a low classification accuracy for imbalanced data.Furthermore,the conventional classification algorithms suffer from long training time due to the large scale of massive data,which encourages researchers to make further study on the distributed classification algorithm.In this paper,we explore support vector machine for large scale imbalanced data classification and we conduct an in-depth research on improving imbalanced data classification accuracy and decreasing training time.The main contribution of this paper are as follows:To solve the problem that classification algorithms has an unfavorable classification accuracy on imbalanced data,we propose ensemble support vector machine based on boosting to improve the imbalanced data classification accuracy of support vector machine.The algorithm uses a stratified under sampling algorithm based on clustering,which we present to preprocess the training data.Besides,We incorporate boosting learning thoughts into boostingsvm and optimize the updating rule of boosting learning.Experimental results demonstrate that stratified under sampling algorithm based on k-means efficiently balances the data and sample data could represent the distribution of original data,boostingsvm promotes the imbalanced data classification accuracy.Consecutive classification algorithm suffers from long training time when dealing with large scale imbalanced data.To address this problem,we propose distributed baggingsvm algorithm based on group training model.The algorithm incorporates optimized cascade support vector machine to preprocess data and splits training data into pieces to train classification algorithm in parallel.Experimental results show that our method can significantly reduce the training time at slight cost of classification accuracy.

Keywords/Search Tags:

data mining, large scale imbalanced data, support vector machine, distributed machine learning

PDF Full Text Request

Related items

1	Study On Imbalanced Data Sets Classi-fication Method And Its Application In Telecommunication
2	Research And Application Of The Support Vector Machine On Large-scale Datas
3	Research On Large Scale Sparse Support Vector Machines
4	Research On Ensemble Learning
5	The Research On Large-scale Support Vector Machine And The Applications
6	Research Of Multiattribute And Large-scale Data Classification Algorithm Based On Support Vector Machine
7	Research On Classification Algorithms For Imbalanced Dataset
8	Research On Support Vector Machine Models And Algorithms For Imbalanced Data
9	Research On Classification Algorithm Of Data Mining Based On Improved Support Vector Machine
10	Some Algorithms Research On Support Vector Machines