Font Size: a A A

Research On Heterogeneous Distributed LSTM For Video Semantic Analysis

Posted on:2020-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y W LiuFull Text:PDF
GTID:2428330596496916Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recurrent Neural Networks(RNNs)have attracted great research interests from a variety of fields,such as natural language processing,speech recognition,machine translation,video analysis and so on.Through the feedback connection between hidden layers,RNN is capable of memorizing the historical information of input data and is particularly suitable for sequential data processing.As a special sequential data,the complex relationship of video content can be learned and mined by RNN for video semantic understanding.However,existing RNNs ignore the time attribute of action semantic in video segment when modeling video sequences,which leads to large amounts of complex and time-consuming matrix computations in hidden neurons at each time step.In addition,as the size and depth of the model expands and the volume of video data increases exponentially,the training time increases dramatically.The training of RNN involves a lot of matrix calculation,which has the characteristics of high complexity,large time cost and high hardware requirements.It is difficult to improve the training efficiency of RNN by existing neural network acceleration methods.Based on a variant of RNN,Long Short-term Memory(LSTM),we propose a heterogeneous distributed LSTM for video semantic analysis in this paper.The main contents of this paper are as follows:(1)We first analyzed the related research of RNN/LSTM for video semantic analysis and argue the existing RNN/LSTM model cannot effectively utilize the characteristics of video sequence and the training is low efficient.In order to improve the training efficiency,we propose a heterogeneous distributed LSTM architecture that supports efficient video semantic analysis.(2)Aiming to overcome the limitations of existing LSTM in modeling video sequences,a distributed LSTM model is proposed.First,a duration-aware LSTM(D-LSTM)is designed to enable the LSTM unit to perceive and memorize the duration of motion semantics in video clips.D-LSTM is able to adaptively update the cell memory and avoid redundant calculations when dealing with videos involving multiple motion durative semantics.Based on this,a distributed training algorithm is proposed for D-LSTM,which simulates the parallel operations of neurons in the biological neural system.The distributed D-LSTM adopts a neuron-centered strategy,and decomposes the complex matrix operations into multiple linear operations carried out on the distributed neuron nodes in parallel.The prototype system is implemented on a Spark cluster and evaluated with two video datasets—Charades and COIN.Compared with the distributed LSTM model,the maximum training efficiency and the convergence speed of distributed D-LSTM model improve by 22.7% and 8.7% respectively;compared with the GPU acceleration method,the maximum training efficiency improved by 79.3%;and the accuracy by 1% compared with the traditional LSTM model.(3)In order to further improve the calculation efficiency,a distributed training method based on heterogeneous C/S is proposed.After analyzing the characteristics of each computation tasks on the neuron interaction node and the distributed neuron node,a C/S based cooperation strategy for GPU and CPU is designed,where the computation tasks are reasonably decomposed and distributed between GPU and CPU.Specifically,the complex matrix operations are put on GPU server,while other easier computations are deployed on CPU cluster,which can take advantages of their respective characteristics of CPU and GPU.Then,a heterogeneous distributed D-LSTM training method based on GPU and CPU combination is presented.GPU is added to the neuron interaction node to complete complex matrix operations,while CPU is used for other computations in the neuron interaction nodes and the operations in the distributed neuron nodes.The prototype system is implemented,and Charades dataset and COIN dataset are used for testing.The experimental results show that compared with the distributed D-LSTM model,the maximum training speed can be improved by 17.6% and the maximum convergence speed can be improved by 13%,while the accuracy remains unchanged.
Keywords/Search Tags:Long Short-Term Memory, Distributed Neuron, Heterogeneous Distributed System, Duration-aware, Video Semantic Analysis
PDF Full Text Request
Related items