Massive Data Pretreatment And Large-scale SVM Algorithm Research

Posted on:2018-12-08

Degree:Master

Type:Thesis

Country:China

Candidate:X Yang

Full Text:PDF

GTID:2348330518995308

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

The benefit of statistically analyzing vast amounts of historical data in discovering useful information is realized decades ago. However, as very large quantities of data are generated in applications, it is costly to store those data first and do analysis afterwards, and especially impractical in the real time analysis. Therefore it is a general method to pretreat the data first and store summaries for future usage. A general way to utilize the pretreated data is to train a model to make predictions.Support Vector Machine (SVM) is an efficient model of structure risk minimization, which could mitigate the over-fitting. However, training a SVM model is time-consuming and how to train a SVM model in parallel has been a hot research area.We proposed a series of solutions to address the problems above,including the design of a distributed real-time counting system and the development of a SVM library. Firstly, we devised a parallel counting algorithm, and analyzed the memory usage and inaccuracy to ensure the algorithm could support scalability as well as hold the error boundary.Secondly, using the algorithm, a practical system based on Storm, which is a real-time computing system, is built and evaluated. Experiments show that the system could deliver high throughput as well scalability. We research on the SVM algorithm and developed a parallel non-linear SVM library based on the distributed computing system Spark at last.

Keywords/Search Tags:

real-time counting algorithm, SVM, parallel algorithm, distributed computing system

PDF Full Text Request

Related items

1	Research On Scheduling Algorithm For Parallel Job Modeled By Directed Acyclic Graph
2	Researching And Implementation Of The Intelligent Scheduling Strategy For Distributed And Parallel Computing
3	Research On The Time Management Technology In Parallel And Distributed Simulation Systems
4	Research&Development Of Distributed Stream Real-time Computing Framework
5	The Parallel Algorithm Of Multiple Cameras Target Detection And Tracking Based On Distributed Cluster
6	Design And Implementation Of Distributed Real-time Video Target Tracking System Based On Stream Computing
7	Design And Implementation Of A Multi-channel Real-time Monitoring Video Data Processing And Analysis System
8	Distributed Quantum Counting Algorithm Research
9	Research Of Parallel FDTD Algorithm Based On LAN
10	Research On Superpixel Segmentation And Fast Implementation Methods