| When faced with an increasingly large data set,or a large-scale distributed cluster environment,LSTM algorithm for time series prediction and analysis will expose obvious training efficiency problems.Therefore,this paper proposes a parallelization LSTM solution.The solution mainly relies on the YARN framework for big data processing capabilities and distributed resource scheduling capabilities,as well as the distributed framework design provided by the TensorFlow architecture.The two are combined to use the YARN framework as the top-level task allocation and resource scheduling module,TensorFlow As the middle layer,the architecture connects the bottom parallel LSTM algorithm and the top YARN architecture.The LSTM algorithm is also modified in parallel by slice grouping.The article focuses on the adaptation scheme of the YARN layer and the TensorFlow layer.By modifying the original framework,it retains its functions for resource scheduling and task allocation.TensorFlow is encapsulated in the node container of the YARN framework.Through related mechanisms,YARN You can publish tasks to the lower level TensorFlow and quickly provide the required resources.As for the parallel solution of the LSTM algorithm,it is mainly grouped based on time slices,serial training within the group,parallel training between groups,one round of training results as two rounds of input,repeated,and more Iterations until the final training result is obtained.In addition,the article introduces the distributed architecture design of the TensorFlow framework as a bridge,and customizes the way of data slice grouping to adapt it to the parallelized LSTM algorithm.The article has made custom modifications to the load balancing mechanism of the YARN framework,obtained the GPU occupancy rate through Pynvml scanning,and modified the original YARN framework weight computer system to consider the GPU occupancy rate for resource scheduling and load balancing.On the other hand,by combining the form of matrix operations,the article also optimizes the LSTM operation process in the TensorFlow framework.Finally,the article designed a number of controlled experiments,and verified the availability of the above solutions when dealing with large-scale data and distributed cluster environments. |