Optimization Of Distributed Machine Learning Programming Model Based On Dependency Of Data Access

Posted on:2017-05-08

Degree:Master

Type:Thesis

Country:China

Candidate:Z S Cao

Full Text:PDF

GTID:2348330503989897

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In the era of big data, the value of the data is becoming more and more important. The main task of machine learning is to infer a model. As the data volume presents explosive increasing, it becomes more difficult to conduct such machine learning tasks as no single compute node can solve such large scale machine learning problems. Thus, both the dataset and the CPU-intensive workload should be distributed over compute nodes. Recently, several distributed machine learning systems which leverage the parameter server architecture have demonstrated state-of-art performance. However, those systems expose the(key, value) access methods to programmers. This burdened programmers with the system level optimization and automatic task-partition and task-parallelization in order to achieve good performance.We propose a new distributed machine learning framework, which also leverage parameter server architecture, but provide high-level interface to developers. The interface decouple the detailed accessing methods of parameter and tensors from application core logic. We also break down the process of machines learning task as a series of stage. The high level interfaces not only greatly simply the process of writing machine learning programs but also provide opportunities for system designers to perform system-level optimization, such as task-partition and task-parallelization.We build a document analysis application which based on the Latent Dirichlet Allocation(LDA) technology, a kind of topic model in text-mining area, to verify our prototyping system. Results demonstrate that our system is scalable in single node or in distribute setting and the time spent at barrier is not big than 15% of the time spent at total training. In one word, our system is scalable and incur low overhead while providing simplified interface and modified building block for large scale machine learning applications.

Keywords/Search Tags:

Distributed Computing, Machine Learning, Programming Model, Dependency of Data Access

PDF Full Text Request

Related items

1	Research On The Key Technologies Of Peer-to-Peer Based High Performance Computing For Applications With Data-dependency
2	Research On Techniques And Systems For Big Data Processing
3	Distributed And Automated Algorithms And Programming Platform For Big Data Intelligent Analytics
4	Research And Implementation Of Distributed Machine Learning Algorithms Orchestration System For Big Data Processing
5	Research On Dataflow Programming Model And Compiler Optimizations For Storm
6	Spark-based Distributed Functional Dependency Discovery Algorithm
7	Matrix Model-Based Cross-Platform Big Data Machine Learning System And Its Performance Optimization
8	Offline Model Training Method And System For Large-Scale Distributed Statistical Machine Translation
9	Design And Implementation Of Network Programming Framework For Distributed Computing
10	A Hybrid Data And Model Transfer Framework For Distributed ML