Construction And Experiment Of Acoustic Model Based On CNN

Posted on:2021-03-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y C Lu

Full Text:PDF

GTID:2428330602973804

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

In recently years human-computer interaction technologies begin to attract attentions from various industries along with the rise of artificial intelligence,while the speech recognition,which is a key technology in human-computer interaction,has once again entered people's field of vision.During the internship in one IT company,the author founds that the acoustic model of speech recognition system which is based on Deep Neural Network(DNN)and currently used by the company has the weaker ability to capture the up and down context information,there is some lack of information on the acoustic feature,and the label alignment operation must be carried on before training.These problems have impeded the further improvement of recognition performance of the system.For these problems,this paper studies the building of acoustic model based on the Convolutional Neural Network(CNN),and optimizes net parameters through the experiments,the main work and innovations of this study are described as follows:1.A CNN-HMM acoustic model based on short-time time-frequency spectrum is built.By adopting CNN to replace the DNN structure in the DNN-HMM acoustic model and converting the speech signal into short-time time-frequency spectrum as the model input the relevant trainings are carried on.The recognition effect comparisons between CNN-HMM acoustic model based on Fbank acoustic features as input and DNN-HMM acoustic model have been made after completing the training.The results show that comparing with the DNN-HMM acoustic model,the CNN-HMM acoustic model based on short-time time-frequency spectrum has a higher recognition performance.When all adopting CNN-HMM as the acoustic model,using short-time time-frequency spectrum as the input also shows an better recognition effect comparing with using Fbank features as the input.2.The influences of the number of convolutional layers and the size of the convolution kernel on the recognition performance of the CNN-HMM acoustic model based on short-time time-frequency spectrum have been analyzed through theexperiments.The relevant training are carried on by separately building the CNN-HMM acoustic models with 2 convolutional layers,3 convolutional layers,and4 convolutional layers and adopting the short-time time-frequency spectrum as the model input.The recognition performance comparison has been made after completing the training.By adjusting the size of the convolution kernel of the CNN-HMM acoustic model based on short-time time-frequency spectrum,the trains are implemented based on the different acoustic models with 2�2,3�3,and 4�4convolution kernel sizes under the same network structure.The recognition performance comparisons have been made after completing the training.The test results prove that both of the increments of the number of the convolutional layers and the enlargement of the size of the convolution kernel can improve the recognition performance of the acoustic models.3.The CNN-CTC acoustic model is built by combining the CNN network with the connectionist temporal classification(CTC)which performs well for the temporal classification tasks.This model can take the entire speech as the input and avoid the label alignment operation which is necessary for traditional acoustic models during the training process,it may also simplifies the model training process.The test results show that,under the situation of two different types of inputs,the CNN-CTC acoustic model can simplify the training process and concurrently improve the recognition performance comparing with the CNN-HMM model.In addition,its decoding speed is also faster to some extent than that of the CNN-HMM acoustic model.

Keywords/Search Tags:

Speech recognition, Acoustic model, Convolutional neural network, Short time-frequency spectrum, Connectionist temporal classification

PDF Full Text Request

Related items

1	Research On Connectionist Temporal Classification In Speech Recognition
2	Research On Speech Recognition Based On Convolutional Neural Networks
3	Research On End-to-end Speech Recognition Based On Convolutional Neural Networks
4	Amdo Tibetan Speech Recognition Based On Deep Neural Network
5	The Design And FPGA Verification Of End-to-end Mandarin Speech Recognition Based On CNN
6	Research On Speech Emotion Recognition Algorithm Based On Deep Learning
7	Research On Mandarin Speech Recognition Technology Based On Deep Neural Network
8	Chineses Speech Recognition System Based On CLDNN Hybrid Model
9	Asr Research Based On CTC
10	Research On Temporal Action Detection In Video