Font Size: a A A

Research On Distributed SVM Algorithm Based On Hadoop Platform

Posted on:2020-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2428330572485951Subject:Engineering
Abstract/Summary:PDF Full Text Request
As the technology is growing the size of data is also growing accordingly.The exponential growth of raw or unstructured data collected by various methods has forced companies to change their business strategies and approaches.The revenue strategies of a growing number of companies are solely based on the information gained from data and utilization of it.Managing and processing large-scale data sets,also known as Big Data,requires new methods and techniques,but storing and analysis the evergrowing amount of data also creates new technological challenges.Support Vector Machine(SVM) is extremely powerful and widely accepted classifier in the field of machine learning due to its better generalization capability.However,SVM is not suitable for large scale dataset due to its high computational complexity.The computation and storage requirement increase tremendously for large dataset.For the support vector machine(SVM),the computational complexity and storage requirements increase tremendously when processing large data sets,we analyze the performance of SVM with single node Hadoop cluster and multi-node cluster,and implement it using MapReduce parallel framework to solve the problem of low efficiency of classic SVM in processing large data sets.MapReduce is a distributed programming model which works on large scale datasets by dividing the huge datasets in smaller chunks.The experimental results show that compared with single-node clusters with large datasets,SVMs with multi-node clusters spend less time processing large-scale datasets,which can effectively speed up the training process.
Keywords/Search Tags:Support Vector Machine, Large Scale Data Sets, Hadoop Cluster, MapReduce Model
PDF Full Text Request
Related items