Font Size: a A A

Research On The Compression Of Neural Networks Based Acoustic Models For Speech Recognition

Posted on:2019-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:2428330542494086Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Automatic speech recognition(ASR)is an important part of human-machine in-teraction and communication.Its main purpose is to make the machine "understand"human speech that convert the speech signal into text.Acoustic model(AM)plays an important role in speech recognition systems.Traditionally,the Gaussian Mixture Model and Hidden Markov Model(GMM-HMM)based acoustic model is mainly used in speech recognition systems.In recent years,with the rapid development of deep learning,compared with the traditional GMM-HMM,the deep neural networks(DNN)based acoustic model has brought breakthrough progress to performance improvement.However,the DNN based acoustic model contains a large number of model parameters and significant computational complexity,which causes great difficulties for DNN that applied to resource-constrained mobile devices.Therefore,the DNN based acoustic model compression is to reduce the number of parameters and computational com-plexity,which facilitates the application of speech recognition systems to resource-constrained mobile devices.This dissertation mainly aims to research on the compression of deep neural net-works based acoustic models for speech recognition.First of all,from the number of model parameters,for the DNN and the Fully Convolutional Neural Networks(FCNN)based acoustic models,we proposed an activation mask based method to analyze and evaluate the importance of neurons during network training.And the neurons that con-tribute less to the network model's output can be removed,which realizes the automatic learning of the number of neurons in each hidden layer and reduces the number of pa-rameters in neural networks.For the recurrent neural networks(RNN)with long short-term networks(LSTM)based acoustic model,we proposed the moving gate to analyze and evaluate the importance of the memory cells in the LSTM model,which removes memory cells that have a small impact on the network's output,and achieves the pur-pose of compressing the network model scale.Experimental results show that the width of the network model can be effectively reduced with these two methods on the basis of ensuring the performance of speech recognition.Then,from the precision of param-eters,for the DNN and the LSTM based acoustic models,this dissertation explores the influence of fixed-point number and integer on the performance of speech recognition systems.Experimental results show that fixed-point number and integer can effectively reduce the precision and parameter complexity in neural networks without performance loss.Finally,based on the advantage of decoding speed of the connectionist temporal classification(CTC)model,the bidirectional LSTM-CTC model is compressed with the proposed moving gate.Experimental results shows that the decoding speed of speech recognition can be improved.
Keywords/Search Tags:speech recognition, acoustic model, deep neural networks, model compression, parameter complexity
PDF Full Text Request
Related items