Font Size: a A A

Research And Implementation Of Distributed Machine Learning Algorithms Orchestration System For Big Data Processing

Posted on:2018-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z HeFull Text:PDF
GTID:2348330518496611Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and internet industry, the volume of data growth explosively, people entered the era of big data. Massive data contains a lot of knowledge value, machine learning,as a key technology to extract useful information from massive data, the learning cost and threshold of machine learning technology are relatively high. Data Analysts need to have statistical knowledge, data modeling ability, algorithm designing ability and programming capability. In order to lower the threshold for the business personnel, people need a general purpose, easy-to-use, excellent performance distributed processing tool for big data to realize data analysis.This thesis designs and implements a distributed machine learning algorithm orchestration system. This system provides users with easy-to-use machine learning service on the distributed environment, lower the threshold for applying machine learning algorithm. This system allows users to analyze and predict data without having to writing program. Users can complete operation such as data importing, data parsing, model training, data prediction smoothly and intuitively through the Web interface. Good Web interface interaction provides an interactive machine learning service for developers and business analysts.OpenStack cloud platform, as the underlying environment of this system, provides a flexible and scalable computing resources and storage resources. By building Hadoop cluster on the cloud platform, YARN (Yet Another Resource Negotiator) provides a parallel computing ability, HDFS(Hadoop Distributed File System) ensures the storage of massive data, and the Spark on the upper layer provides a more efficient in-memory computing ability for machine learning algorithms that require iteration computing. In the aspect of machine learning algorithm, this system realizes the core algorithm module in the data processing layer, and provides the classic algorithms such as classification, regression, clustering and so on. In the aspect of orchestration algorithm, this system realizes the workflow management module in the business logic layer,and provides the business logic of the workflow scheduling. In the presentation layer,the interactive component module implements all interface between user and Web page.The test results show that this system can ensure that all functions perform well and implements a simple and friendly algorithm orchestration operation. The algorithm performance of the system has reached the expected performance requirements.
Keywords/Search Tags:Machine learning, Distributed computing, Spark workflow
PDF Full Text Request
Related items