Font Size: a A A

Research On Incremental LSTM Based On Sparse Distribution Activation

Posted on:2021-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z XiaFull Text:PDF
GTID:2428330629987257Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of new artificial intelligence technology and the explosive growth of massive data,how to use new technologies to efficiently and accurately process and analyze the growing data stream is a challenging task.Existing deep neural networks usually use training methods based on batch data,which cannot effectively handle the dynamic growth of data;keeping all historical data can guarantee the learning effect,but it will bring huge pressure to storage and calculation.Recurrent Neural Networks(RNNs)are deep learning models suitable for data flow analysis and modeling.They can mine time series associations in sequence data.Its variants are Long Short-term Memory(LSTM)It has been successfully applied to various sequence data processing tasks such as machine translation and speech recognition,and is the most widely used recurrent neural network structure.However,the existing LSTM model cannot effectively adapt to the dynamic growth of the data stream,and conventional training methods will cause "catastrophic forgetting"(CF)problems.In order to improve the analysis and processing capacity of the growing massive data and enhance the usability of the LSTM model in real scenarios,this paper based on the existing LSTM model,for the problem of memory forgetting in the incremental learning of data flow,studied the activation based on sparse distribution Incremental LSTM.The main research contents of this article are as follows:(1)First,analyze the related research of the incremental training of deep neural networks.To address the problems that the current LSTM model cannot adapt to the dynamic changes and growth of the data stream and the large training time overhead,the overall structure of the incremental LSTM system based on sparse distribution activation is proposed.Provide technical support for efficiently processing massive amounts of data.(2)From the perspective of model structure,the existing "plasticity-stability" contradiction and the resulting "memory forgetting" problems in incremental learningof the existing fixed structure deep neural network model are analyzed.And other methods,the LSTM model based on sparse distribution is proposed.The cell state c_t and the hidden layer state h_t neurons in the LSTM unit are grouped separately.According to the activation value of the neuron,some neurons are selected to be active while suppressing other neighboring neurons.Compared with the existing sparse regularization method,while restricting the sparse activation of neurons,the introduction of the suppression radius ensures the uniform distribution of activated neurons and improves the stability of the LSTM model.The corresponding prototype system was implemented and tested and analyzed using data sets such as PTB and Permuted MNIST.The experimental results show that compared with the LSTM Dropout method,the accuracy of the LSTM based on sparse distribution activation on the Permuted MNIST data set can be increased by up to 12.9%,In the language modeling task,the confusion is reduced by at most 4.631.(3)From the perspective of training methods,analyze the characteristics of the BPTT(Back Propagation Through Time)algorithm used to train LSTM,especially the model parameter transfer and gradient update rules during the incremental learning of data streams.In order to solve the problems of large space-time overhead caused by the full storage of historical data and the limitation of BPTT back propagation step size gradient backhaul,an incremental training method based on compression and memory consolidation is proposed.According to LSTM's forget gate,important data in the historical sequence is preserved and compressed,and important parameters of the historical training process are passed for training of new data,so that historical information will not be completely "lost" over time.In addition,memory consolidation is added in the training process to force the gradient information to break through the BPTT step size limitation and to update the parameters of each time step,further effectively preventing "catastrophic forgetting".The corresponding prototype system was implemented,and the data sets such as PTB and Permuted MNIST were tested and analyzed.The experimental results show that,compared with the iCaRL incremental learning algorithm,the accuracy rate can beincreased by up to 16.9%,and the perplexity is reduced by up to 4.899.(4)Combining the above sparse distribution LSTM and the improved incremental training method,an incremental LSTM prototype system based on sparse distribution activation is proposed and implemented.The prototype system is then used to complete the Changzhou subway passenger flow prediction task.The functional modules and business processes of the prototype system are designed.The prediction results on the Changzhou subway passenger flow data set show that the incremental LSTM system based on sparse distribution activation proposed in this paper can effectively perform incremental learning of passenger flow data,and the prediction results are more accurate.
Keywords/Search Tags:Long Short-Term Memory, Incremental Learning, Catastrophic Forgetting, Sparse Distribution, Memory Consolidation
PDF Full Text Request
Related items