Font Size: a A A

Data Mining Engine Based On Big Data

Posted on:2016-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:J Q FanFull Text:PDF
GTID:2298330467492104Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the rapid development of the Internet technology, lager amount of data has been accumulated. The scale of data has raised from GB to TB or even PB. To uncover the underlying value of data, the common solution is to apply data mining algorithms upon the data set. Although fully devetoped and utilized on small data set and thus proved its value, when it comes to big data set, data mining algorithms are facing unprecedented challenges in terms of efficiency, algorithm parallelization and platform usability.This dissertation is an engineering project, it surveyed and researched many related open source solutions. Based on the research conclusions, Spark is employed as the core engine and programming paradigm and then some parallel data mining algorithms were designed and implemented. Moreover, it built an efficient and easy-to-use system. The dissertation covered the following content:(1)Researched two main big data parallel computing model, namely, programming model based on MapReduce paradigm and in-memory operator paradigm. By comparing their efficiency, interface richness and usability, in-memory computation is employed and Spark is determined as the core engine of big data processing.(2)Completed the parallelization of two data mining algorithm, namely Apriori and PageRank, based on the action and transformation operators provided by the Spark paradigm. Verified the efficiency and parallelism of the two proposed parallel algorithms.(3)Designed and implemented big data mining platform, which serves in the way of "Platform as a Service". It solved usability, cross-platform and multi-user concurrency control problems.Through the above work, the dissertation provides an edge tool for the implementation of data mining algorithm on big data set.
Keywords/Search Tags:Big Data, Data Mining, Spark
PDF Full Text Request
Related items