Font Size: a A A

Optimization And Parallelization Of The Cascade SVM

Posted on:2019-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:P H GuoFull Text:PDF
GTID:2348330569489992Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Purpose —As a parallel SVM(Support Vector Machine),the Cascade SVM improves the training efficiency of SVMs on large-scale data through global problem decomposition,non-support vector filtering and feedback.Cascade SVM can obtain models with certain accuracy in most applications,and the corresponding models need to be obtained through multiple feedback iterations in a few applications.From the final training results,the model accuracy and the stability of the Cascade SVM are still to be improved compared with the single machine training.In order to solve the above problem,a cross-validation Cascade SVM is proposed.In order to implement this parallel SVM efficiently,the parallelization of Cascade SVM and cross-validation Cascade SVM are realized respectively on distributed computing platform.Design/methodology/approach — In the implementation of Cascade SVM,the initial random partition process and the merging algorithm of two SVMs will have different degrees of influence on the final training results.Through analysis and experiment,the above two aspects will be expanded and optimized respectively.For that parallelization process,an implementation based on the Spark platform is consider.Findings — Firstly,the initial random partition algorithm of Cascade SVM,may reduce the number of final global support vectors in extreme partitioning situations.Therefore,a restricted random partition algorithm is proposed to avoid the influence of the initial partition on the final model by limiting the proportion of positive and negative samples in each subset to be equal.Secondly,when two SVMs are combined,a cross-validation merging algorithm is proposed,which considers the " special points" other than support vectors,that is,the points where the non-support vectors in one subset violate the training results of another subset.Then,based on the extension of these two aspects,cross-validation Cascade SVM is proposed.Finally,the parallel process of Cascade SVM and cross-validation Cascade SVM is realized on spark platform,and the effectiveness and stability of cross-validation Cascade SVM are verified through experiments on different data sets.Practical implications — The research on the influence of initial partition on the accuracy of the final model is still in a relatively simple level,and the distribution of the divided subsets is not guaranteed to be similar to that of the original data theoretically.At the same time,in the process of parallelization,although the parallel training of multiple subsets can be guaranteed,the parallelization process of the training algorithm adopted by each subset needs to be studied and extended.Originality/value — A restricted random partitioning algorithm is proposed to avoid the global support vector reduction under extreme partitioning.The original merging algorithm is extended by cross-validation and the merging algorithm of cross-validation is obtained.The optimization of the above two aspects finally obtained the cross – validation Cascade SVM;The parallel training and prediction process are realized through Spark,and the parallelization of traditional Cascade SVM and cross-validation Cascade SVM is realized.
Keywords/Search Tags:Machine Learning, Cascade SVM, Initial Partition, Across Validation, Parallelization
PDF Full Text Request
Related items