Research On Parallel SVM Algorithm Based On Flink Platform

Posted on:2022-07-02

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Bai

Full Text:PDF

GTID:2518306524955869

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,the development of information technology has risen to a new historical stage,affecting all aspects of social production patterns and people’s lives.Smart mobile devices and smart home products are constantly producing data,and huge benefits and values are hidden in the massive amounts of data.When traditional data mining algorithms train models on large-scale data sets,the training efficiency is not high,so the improvement of traditional data mining algorithms becomes urgent.Support Vector Machine Compared with other commonly used data mining classification algorithms,there are few overfitting and dimensional disasters caused by too many attribute features in the algorithm training process.The impact on the performance of the algorithm is minimal,and it has little effect.With clever use on the kernel function,the algorithm can handle the linear inseparability of the data set.However,traditional stand-alone support vector machines cannot efficiently process large data sets.During the operation of the algorithm,slow training speed,memory overflow,running crash and other low performance problems will occur.In view of the low efficiency of the single-machine SVM algorithm in the processing of large-scale data sets,the algorithm is combined with the idea of parallel computing,combined with the current mainstream big data computing framework Flink,and the parallel SVM algorithm based on the Flink platform is designed to solve the training efficiency of the single-machine SVM algorithm In the case of low accuracy,experiments have shown that the algorithm training speed can be greatly improved and the training time can be reduced in the case of low accuracy loss.The main content of the paper is as follows:(1)Aiming at the problem of the slow optimization speed of the single-machine global grid search algorithm,according to the big data "divide and conquer" idea,the global parameters are divided into several small pieces and sent to each computing node for parallel optimization,and finally the optimization results are summarized.Select the optimal parameters,compared with the stand-alone global grid search algorithm,the Flink parallel grid search algorithm improves the optimization speed and reduces the optimization time.(2)Combining the advantages and disadvantages of cascading and grouping training SVM algorithms,the design is based on the Flink parallel SVM algorithm.By optimizing the performance of the parallel operation operator,the distributed broadcast variables are introduced,and the algorithm is optimized to effectively solve the problem of low efficiency of single-machine SVM algorithm training.When the accuracy is slightly lost,the training speed can be greatly improved,and the training time can be effectively reduced.(3)Summarize the deficiencies in the work and scientific research done in this article,and look forward to the idea of real-time machine learning.

Keywords/Search Tags:

SVM, Data mining, Flink, Big Data, parallel computing

PDF Full Text Request

Related items

1	The Research About K-Means Parallel And Task Scheduling On Flink
2	An Interactive Data Mining And Visualization System Using Parallel Computing
3	Research On Resource Scheduling Method Based On Flink Framework Of Computing On Data Stream
4	Privacy Protection Technology Of Big Data Based On Flink Platform
5	Research On Data Provenance System Based On Flink Platform
6	Research Of Key Technologies Of Data Mining Tellcommunication Oriented
7	TCM Data Mining Platform And The Services
8	Parallel Data Mining Theory Research And Application
9	Research About Data Mining Technologies Based On Cloud Computing
10	Fast processing of Web log data using parallel computing