Research On Parallelization Of LSTM-based Time Series Prediction Algorithm

Posted on:2021-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:C Z Meng

Full Text:PDF

GTID:2428330623968157

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

When faced with an increasingly large data set,or a large-scale distributed cluster environment,LSTM algorithm for time series prediction and analysis will expose obvious training efficiency problems.Therefore,this paper proposes a parallelization LSTM solution.The solution mainly relies on the YARN framework for big data processing capabilities and distributed resource scheduling capabilities,as well as the distributed framework design provided by the TensorFlow architecture.The two are combined to use the YARN framework as the top-level task allocation and resource scheduling module,TensorFlow As the middle layer,the architecture connects the bottom parallel LSTM algorithm and the top YARN architecture.The LSTM algorithm is also modified in parallel by slice grouping.The article focuses on the adaptation scheme of the YARN layer and the TensorFlow layer.By modifying the original framework,it retains its functions for resource scheduling and task allocation.TensorFlow is encapsulated in the node container of the YARN framework.Through related mechanisms,YARN You can publish tasks to the lower level TensorFlow and quickly provide the required resources.As for the parallel solution of the LSTM algorithm,it is mainly grouped based on time slices,serial training within the group,parallel training between groups,one round of training results as two rounds of input,repeated,and more Iterations until the final training result is obtained.In addition,the article introduces the distributed architecture design of the TensorFlow framework as a bridge,and customizes the way of data slice grouping to adapt it to the parallelized LSTM algorithm.The article has made custom modifications to the load balancing mechanism of the YARN framework,obtained the GPU occupancy rate through Pynvml scanning,and modified the original YARN framework weight computer system to consider the GPU occupancy rate for resource scheduling and load balancing.On the other hand,by combining the form of matrix operations,the article also optimizes the LSTM operation process in the TensorFlow framework.Finally,the article designed a number of controlled experiments,and verified the availability of the above solutions when dealing with large-scale data and distributed cluster environments.

Keywords/Search Tags:

Big Data, YARN, TensorFlow, LSTM, Load balancing

PDF Full Text Request

Related items

1	Research On Load Balancing Based On Yarn
2	Research And Implementation Of Load Balancing Algorithm For Offline Data Migration
3	Research On Energy-aware Load Balancing In Heterogeneous Hadoop Cluster
4	Research On MapReduce Program Based On YARN
5	Research On Load Balancing Algorithm For Identification Resolution System Of IIoT
6	Based On Improved Hadoop Yarn Scheduler Design And Implementation Of Large Data Supporting Platform
7	Design And Implementation Of Distributed Stress Test System Based On Spark Load Balancing
8	Research On Adaptive Load Balancing Strategy For MongoDB Based On Hotspot And Performance Difference Among Shardings
9	Design And Implementation Of A Load Balancing System With The Function Of Web Data Integration
10	Design And Implementation Of Web Load Balancing System