Distributed SVM Algorithm With K-means

Posted on:2016-06-15

Degree:Master

Type:Thesis

Country:China

Candidate:R Wang

Full Text:PDF

GTID:2348330488474312

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the rapid development of network technology, the amount of data generated by network growth is amazing; the complexity of the data structure is also growing intensified. At present, many scholars are studying the hot issues how to search for the useful information from the sea of complex data. Support Vector Machine is a very well-known method of supervised classification data with high accuracy to predict, rare overfitting, few parameters tuning and other advantages. Therefore, it becomes the preferred method to solve classification problems. However, when enlarging the scale of training data sets and even storage and processing capabilities far beyond a single node computer, large memory and long training time limits the traditional stand-alone SVM application in the field of big data. The study found that clustered based on distributed processing can effectively shorten the training time and solve the problem of large memory footprint. Accordingly, studying SVM algorithm on distributed parallel computing is particularly important.Currently, the prediction accuracy is relatively high distributed parallel SVM algorithms are mostly used multi- layer iterative full feedback mechanism to achieve. Multi- level iteration is remove no-SV layer by layer to retain SV, and then eventually preserved the SV is a global optimal solution of the original training set. These algorithms are used random partition methods to generate sub-sample set of parallel training. The experimental test demonstrated that there exist two deficiencies of generating sub-sample set by the means of random partition: the first is the distribution of sub-sample set and original data set is likely to be biased; the second is in a concurrent environment, every sub-sample set training using may vary. That result in the overall training model is not reliable, low final training prediction accuracy and repeatedly training results obvious jitter. To this end, this thesis proposes a parallel SVM optimization algorithm that based on k- means clustering to generate sub-sample set, using unsupervised k- means clustering algorithm divided oriented data sets, thus effectively avoid ing the problems randomly divided. Parallel SVM optimization algorithm is deployed to the popular distributed computing Hadoop platform to complete testing of experimental data sets.Experimental results show that the proposed optimization algorithm ca n not only effectively reduce the distribution deviation of generated sub-data sets and original data set when increase in the number of division by random partition, but also reduce the jitter of the whole training model. As a result, the algorithm is robust and better generalization ability to learn.

Keywords/Search Tags:

Support Vector Machine, C lustering, Distributed Computing, Random Partition, Hadoop

PDF Full Text Request

Related items

1	Research Of Distributed Support Vector Machine (SVM) Based On Hadoop Cloud Platform
2	Research And Application Of Distributed Support Vector Machine Based On Hadoop
3	Research On Classification Method Of Random Support Vector Machine And Its Application
4	Design Of Support Vector Machine Accelerator Based On Reconfigurable Computing Platform
5	Research On Web-partition Technique In Distributed Information Collection System
6	Research On Distributed SVM Algorithm Based On Hadoop Platform
7	Research Of Parameter Optimizational Distributed SVM Based On Hadoop Platform
8	Adaptive Scheduling Using Support Vector Machine on Heterogeneous Distributed Systems
9	Performance Improvement Based On Pinball Support Vector Machine
10	Research On Hadoop Based Fuzzy Support Vector Machine