Font Size: a A A

Research On Key Technologies Of RNN Algorithms Optimization And Hardware Acceleration

Posted on:2020-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:C GaoFull Text:PDF
GTID:2428330620453191Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The continuous improvement of computer science and hardware information processing capabilities has provided a steady stream of power for the continued development of artificial intelligence.Natural language processing has received extensive attention as an important means of human-computer interaction.Recurrent neural network is an algorithm commonly used in natural language processing.In recent years,it has gradually replaced traditional algorithms under the complex application scenarios and strict computing performance requirements in the era of big data with its intentional time-learning capabilities.It has achieved significant practical effects in the fields of machine translation,text classification and speech recognition.Due to the large scale and complex network structure of the recurrent neural network,its computational complexity and time complexity are generally higher than the traditional algorithms.Although existing researches often parallelize it based on the hardware platforms,they still have the following deficiencies:(1)Recurrent neural network introduces past time information as the current input,resulting in a higher model calculation delay.Existing research often parallelizes recurrent neural networks at the cost of a large amount of resource consumption.The hardware acceleration architecture isn't designed based on the characteristics of the model,which resulting in poor compatibility with the hardware platform.(2)The sample data is usually distributed with redundant values.These values have little effect on the final state update of the model,resulting in a redundant operation of the hardware-accelerated architecture.Existing research hasn't focused on the impact of sample data redundancy on the computational overhead of hardware-accelerated architectures.(3)The model weight matrix generally has high-dimensional characteristics in order to better learn the sequence data information.The high-dimensional weight matrix consumes too much storage resources in hardware acceleration,resulting in poor real-time loading of data.This paper aims to reduce the computational delay of recurrent neural network acceleration architecture based on the above problems,and proposes corresponding solutions.The following research results are obtained:1.For the problem of high computational latency of RNN and poor compatibility between existing architectures and hardware platforms,a design scheme of computing architecture based on Roofline model is proposed.By modeling the computational communication ratio and bandwidth based on the Roofline model,the existing parallel matrix vector operation mode is optimized.Further more,it combines the technology of parameter fixed-point quantization,pipeline and data storage to improve the accelerator's computing communication ratio.Simulation experiments show that the parallel computing architecture can effectively reduce the computational delay.It better fit the hardware platform and has higher energy efficiency than existing research.2.Aiming at the problem that the parallel acceleration architecture has a large number of invalid operations,which is caused by the limited effect of redundant sample data on the state update of the RNN,a hardware accelerated design scheme based on sample data redundancy is proposed.The numerical similarity of the data is used for constructing sparse sample data.In order to further reduce the computational overhead of the hardware-accelerated architecture,the numerical threshold is set to filter the redundant sample data.The Experiments on the MNIST standard dataset show that when the numerical threshold does not exceed 0.5,the detection accuracy of the model doesn't change,and the computational overhead is effectively reduced.3.Aiming at the problem that the model's high-dimensional weight matrix occupies too much storage resources and the weight loading speed is difficult to match the hardware computing ability,a weight matrix compression method based on SVD singular matrix decomposition is proposed.The singular values of the high-dimensional weight matrix are extracted by applying the SVD algorithm.By using energy ratio according to the distribution of singular value,the low-dimensional structure of the high-dimensional matrix is adaptively searched to achieve the dimensionality reduction of the weight matrix.Experiments show that without reducing the performance of the model detection,a maximum of about 40% weight parameter compression can be achieved.
Keywords/Search Tags:Recurrent neural network, Hardware acceleration, Roofline model, Redundancy, Singular matrix decomposition
PDF Full Text Request
Related items