Research On Distributed SVM Algorithm Based On Hadoop Platform

Posted on:2020-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Wang

Full Text:PDF

GTID:2428330572485951

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

As the technology is growing the size of data is also growing accordingly.The exponential growth of raw or unstructured data collected by various methods has forced companies to change their business strategies and approaches.The revenue strategies of a growing number of companies are solely based on the information gained from data and utilization of it.Managing and processing large-scale data sets,also known as Big Data,requires new methods and techniques,but storing and analysis the evergrowing amount of data also creates new technological challenges.Support Vector Machine(SVM) is extremely powerful and widely accepted classifier in the field of machine learning due to its better generalization capability.However,SVM is not suitable for large scale dataset due to its high computational complexity.The computation and storage requirement increase tremendously for large dataset.For the support vector machine(SVM),the computational complexity and storage requirements increase tremendously when processing large data sets,we analyze the performance of SVM with single node Hadoop cluster and multi-node cluster,and implement it using MapReduce parallel framework to solve the problem of low efficiency of classic SVM in processing large data sets.MapReduce is a distributed programming model which works on large scale datasets by dividing the huge datasets in smaller chunks.The experimental results show that compared with single-node clusters with large datasets,SVMs with multi-node clusters spend less time processing large-scale datasets,which can effectively speed up the training process.

Keywords/Search Tags:

Support Vector Machine, Large Scale Data Sets, Hadoop Cluster, MapReduce Model

PDF Full Text Request

Related items

1	Research And Application Of The Support Vector Machine On Large-scale Datas
2	Research On Fast Training Method Base On Core Vector Machine And Support Vector Machine
3	Study On Imbalanced Data Sets Classi-fication Method And Its Application In Telecommunication
4	A Face Recognition Method Of Support Vector Machine Learning By MapReduce Model
5	The Research And Implementation Of Parallelism Of Information Retrieval Related Algorithms Based On Mapreduce
6	Research On Large Scale Sparse Support Vector Machines
7	Research On Support Vector Machine For Large Scale Imbalanced Data
8	Some Studies On Subsampling And Variable Selection In Large-scale Data
9	Research On Support Vector Machine Solving The Large-scale Data Set
10	The Research On Large-scale Support Vector Machine And The Applications