Font Size: a A A

Research On Hadoop Based Fuzzy Support Vector Machine

Posted on:2016-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2308330473965419Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Support vector machine(SVM) is a kind of machine learning algorithm which is based on statistical learning theory and embodies the principle of structural risk minimization of statistical learning theory in practical applications. Some tough problems like nonlinear, higher dimensions and over-learning problems can be better solved by SVM. SVM algorithm has been widely applied in speech recognition, face recognition, text categorization and some other field for its excellent learning ability. While there exist much fuzzy information in objective world, if we use SVM to train samples with fuzzy information, that will skew the precision of classification results. Based on this, fuzzy support vector machine(FSVM) is proposed to solve this problem, research on FSVM has been a hot topic recently.Current FSVM can’t train imbalanced samples properly, the membership of FSVM can’t reflect the importance of each sample precisely and may lead to an error of classification results. To these problem, an improved FSVM is proposed. In the proposed FSVM, the imbalanced factor is introduced based on the ratio of positive class to negative class, furthermore, in the process of designing the membership, samples are firstly separated into outliers, noises, support vectors and inner vectors based on sample densities, and then combining with the distance factors, samples can be assigned different fuzzy memberships. An experiment is done to show that the improved FSVM has better performance for imbalanced data with more outlier and noise points.FSVM is a complex algorithm which may take a lot of time to train dataset, especially large amounts of dataset. To these problems, hadoop based FSVM is proposed. This method takes advantage of the high efficiency of hadoop when proposing large datasets and uses cascade model to design mapreduce jobs. Firstly proportional partition algorithm is used to divide the original data into parts, and then the proposed FSVM is used to train the sub sets of samples to obtain the support vectors, at last, we repeat the above steps to obtain the support vectors of pairwise merge of the sub support vectors till the global support vectors are finally obtained. This is a good way to process large datasets through divide and rule strategy, which can utilize limited resources with balance and decrease training time. We do an experiment through small hadoop cluster, the results show that our algorithm can much decrease the training time without decrease the precise of classification model.
Keywords/Search Tags:Support Vector Machine, Fuzzy Support Vector Machine, Imbalanced Data, Hadoop, Big data
PDF Full Text Request
Related items