Font Size: a A A

Research And Implementation Of Unified Large Data Mining Service Platform Based On Spark MLlib

Posted on:2018-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q LinFull Text:PDF
GTID:2428330542987081Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Now the Internet information technology is developing faster and faster,the amount of data accumulated in just a few seconds has far exceeded the GB level,storage equipment demand rises rapidly from TB level to PB level requirement.In order to meet to excavate the potential commercial value in such a large quantity of data and these data can be reflected in the value of,people need to use the data mining algorithm for data analysis,in order to get the desired results.On the amount of data within several tens of GB level,traditional stand-alone mining can be done with a good solution,but to run algorithms on large data hundreds of GB level and above,this is a very difficult thing.Traditional methods consume a lot of server computing resources and spend a lot of time running these algorithms,efficiency in the implementation of the performance problems will encounter bottlenecks.Combined with the recent large data processing engine Apache Spark framework,in the large data set processing power has been a number of experimental institutions to verify many times.Compared with the traditional data mining methods can not deal with massive data,Spark is suitable for a variety of distributed platform,can be simple and low consumption all processes together.This paper presents a unified data in Spark MLlib based mining services platform,the specific work is as follows:1?The hierarchical design of platform and the design of each layer based on Spark and Hadoop framework,the cluster environment is built on the existing OpenStack platform,and the resource allocation management of distributed parallel computing is realized.The realization of each layer design from the bottom to the top are:communication layer,cloud foundation layer,analysis and excavation layer,visualization layer.2?The communication layer is the acquisition layer of each equipment,and the platform realizes the data collection through the external open data receiving interface.Cloud foundation layer is based on the OpenStack virtual cloud platform,build Spark cluster and YARN and other resource managers for distributed computing and storage provides the underlying resources.3?Mining layer design including a workflow management module,data preprocessing module,bulk data mining,real-time data mining,excavation unified interface design module,the module for background use hierarchical design maven project management,while using a variety of object-oriented design mode,in order to meet the scalability,isolation interfaces principles platform.MLlib-based machine learning algorithm can be configured,transparent,and the use of adapter mode for unified interface.4?Provide a unified external interface and external release unified interface services to facilitate third-party calls,including from the http interface,webservice interface,remote RMI calls,the management console port,etc.for data transmission.In addition,the visualization layer provides a visual process design interface on the Web side,which can quickly and efficiently design the data mining process and perform the mining process according to the process.
Keywords/Search Tags:Big Data, Data mining, Spark Platform, Distributed Computing, Machine Learning
PDF Full Text Request
Related items