Research On Mandarin Speech Recognition Technology Based On Deep Neural Network

Posted on:2021-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:J B Huang

Full Text:PDF

GTID:2518306575951969

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of computer technology,human-computer interaction has gradually changed from purely text-instruction interaction to multiple types of imfomation interaction,such as voice and image.Automatic speech recognition is the basis of speech interaction,and its purpose is to convert speech signals into natural language text.Traditional automatic speech recognition systems include basic units such as speech preprocessing,language model,phoneme dictionary,acoustic model,and decoder.The paper aims at the problem that traditional acoustic models has poor learning ability for knowledge about speech characteristics.According to the characteristics of speech signals and neural network's learning ability,a new scheme of acoustic model is proposed,and the structure of the constructed model is optimized,which could improve the performance and efficiency of the Chinese automatic speech recognition system.This paper proposes Time Delay Neural Network Acoustic Model,TDNN-LSTM Acoustic Model and Spectrum Augment Acoustic Model.Time Delay Neural Network Acoustic Model is a structure built for the context-related characteristics of speech signals.TDNN-LSTM Acoustic Model is a network constructed by connecting long short-term memory structures on the basis of time delay neural network acoustic model,it has good learning ability for both context-related and long-term characteristics.Spectrum Augment Acoustic Model uses dynamic spectrum augment methods to optimize the input features of the TDNN-LSTM Acoustic Model.This paper uses the above three acoustic models including N-gram language model and decoder to construct a mandarin automatic speech recognition system,then we test and verify the Sytem on Chinese speech corpus AISHELL.The experimental results show that the word error rate of time delay neural network speech recognition system is 7.51%,which is 7.58% lower than the traditional GMM-HMM speech recognition system.TDNN-LSTM speech recognition system has same recognition effect as time delay neural network speech recognition system,but the training time is 16% less than it.Spectrum augment speech recognition system further reduces the word error rate of the TDNN-LSTM system by 1.42%.Based on the above research results,this paper designs and implements a browser/server(B/S)mode Mandarin speech online recognition system,and experiments are further conducted to testify the practicability of the proposed speech recognition model.

Keywords/Search Tags:

Speech recognition system, Acoustic model, Time delay neural network, Long short-term memory network, Spectrum augment

PDF Full Text Request

Related items

1	Research On Uyghur Speech Recognition Based On Deep Learning
2	Long Short Term Memory Recurrent Neural Network Application To Handwritten Recognition
3	Analysis Of Effective Fused Features And Model Evaluation For Speech Emotion Recognition
4	Acceleration Gesture Recognition Based On Long-short Term Memory Network
5	Research And Application Of The Short-term Memory Network For Adjusting Gate Length
6	Amdo Tibetan Speech Recognition Based On Deep Neural Network
7	Design And Implementation Of Speech Recognition System Based On DNN-LSTM
8	Chinese Sign Language Recognition Based On Convolutional Network And Long Short Term Memory Network
9	Construction And Experiment Of Acoustic Model Based On CNN
10	Online Handwritten Math Expression Label Recognition Based On Long Short Term Memory Recurrent Neural Network