Research On Hadoop Based Fuzzy Support Vector Machine

Posted on:2016-08-30

Degree:Master

Type:Thesis

Country:China

Candidate:L Liu

Full Text:PDF

GTID:2308330473965419

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Support vector machine(SVM) is a kind of machine learning algorithm which is based on statistical learning theory and embodies the principle of structural risk minimization of statistical learning theory in practical applications. Some tough problems like nonlinear, higher dimensions and over-learning problems can be better solved by SVM. SVM algorithm has been widely applied in speech recognition, face recognition, text categorization and some other field for its excellent learning ability. While there exist much fuzzy information in objective world, if we use SVM to train samples with fuzzy information, that will skew the precision of classification results. Based on this, fuzzy support vector machine(FSVM) is proposed to solve this problem, research on FSVM has been a hot topic recently.Current FSVM can’t train imbalanced samples properly, the membership of FSVM can’t reflect the importance of each sample precisely and may lead to an error of classification results. To these problem, an improved FSVM is proposed. In the proposed FSVM, the imbalanced factor is introduced based on the ratio of positive class to negative class, furthermore, in the process of designing the membership, samples are firstly separated into outliers, noises, support vectors and inner vectors based on sample densities, and then combining with the distance factors, samples can be assigned different fuzzy memberships. An experiment is done to show that the improved FSVM has better performance for imbalanced data with more outlier and noise points.FSVM is a complex algorithm which may take a lot of time to train dataset, especially large amounts of dataset. To these problems, hadoop based FSVM is proposed. This method takes advantage of the high efficiency of hadoop when proposing large datasets and uses cascade model to design mapreduce jobs. Firstly proportional partition algorithm is used to divide the original data into parts, and then the proposed FSVM is used to train the sub sets of samples to obtain the support vectors, at last, we repeat the above steps to obtain the support vectors of pairwise merge of the sub support vectors till the global support vectors are finally obtained. This is a good way to process large datasets through divide and rule strategy, which can utilize limited resources with balance and decrease training time. We do an experiment through small hadoop cluster, the results show that our algorithm can much decrease the training time without decrease the precise of classification model.

Keywords/Search Tags:

Support Vector Machine, Fuzzy Support Vector Machine, Imbalanced Data, Hadoop, Big data

PDF Full Text Request

Related items

1	Research On Support Vector Machine Models And Algorithms For Imbalanced Data
2	Support Vector Machine Based Classification Algorithms Research For Imbalanced Data
3	The Key Techologies Of Fuzzy Support Vector Machine
4	Support Vector Machine Based Classification Models And Algorithms Research For Imbalanced Data
5	Research And Application Of Imbalance Data Classification Based On Support Vector Machine
6	Research And Application Of Distributed Support Vector Machine Based On Hadoop
7	Research Of Distributed Support Vector Machine (SVM) Based On Hadoop Cloud Platform
8	Research On Some Problesm Of Support Vector Machine Learing Algorithm
9	Research On Improved Support Vector Machine Based On Category Imbalanced Dataset
10	Researches On Optimization Modeling Methods Of Support Vector Machine