Font Size: a A A

Massive Data Pretreatment And Large-scale SVM Algorithm Research

Posted on:2018-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2348330518995308Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The benefit of statistically analyzing vast amounts of historical data in discovering useful information is realized decades ago. However, as very large quantities of data are generated in applications, it is costly to store those data first and do analysis afterwards, and especially impractical in the real time analysis. Therefore it is a general method to pretreat the data first and store summaries for future usage. A general way to utilize the pretreated data is to train a model to make predictions.Support Vector Machine (SVM) is an efficient model of structure risk minimization, which could mitigate the over-fitting. However, training a SVM model is time-consuming and how to train a SVM model in parallel has been a hot research area.We proposed a series of solutions to address the problems above,including the design of a distributed real-time counting system and the development of a SVM library. Firstly, we devised a parallel counting algorithm, and analyzed the memory usage and inaccuracy to ensure the algorithm could support scalability as well as hold the error boundary.Secondly, using the algorithm, a practical system based on Storm, which is a real-time computing system, is built and evaluated. Experiments show that the system could deliver high throughput as well scalability. We research on the SVM algorithm and developed a parallel non-linear SVM library based on the distributed computing system Spark at last.
Keywords/Search Tags:real-time counting algorithm, SVM, parallel algorithm, distributed computing system
PDF Full Text Request
Related items