Font Size: a A A

Research On Mandarin Speech Recognition Technology Based On Deep Neural Network

Posted on:2021-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:J B HuangFull Text:PDF
GTID:2518306575951969Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computer technology,human-computer interaction has gradually changed from purely text-instruction interaction to multiple types of imfomation interaction,such as voice and image.Automatic speech recognition is the basis of speech interaction,and its purpose is to convert speech signals into natural language text.Traditional automatic speech recognition systems include basic units such as speech preprocessing,language model,phoneme dictionary,acoustic model,and decoder.The paper aims at the problem that traditional acoustic models has poor learning ability for knowledge about speech characteristics.According to the characteristics of speech signals and neural network's learning ability,a new scheme of acoustic model is proposed,and the structure of the constructed model is optimized,which could improve the performance and efficiency of the Chinese automatic speech recognition system.This paper proposes Time Delay Neural Network Acoustic Model,TDNN-LSTM Acoustic Model and Spectrum Augment Acoustic Model.Time Delay Neural Network Acoustic Model is a structure built for the context-related characteristics of speech signals.TDNN-LSTM Acoustic Model is a network constructed by connecting long short-term memory structures on the basis of time delay neural network acoustic model,it has good learning ability for both context-related and long-term characteristics.Spectrum Augment Acoustic Model uses dynamic spectrum augment methods to optimize the input features of the TDNN-LSTM Acoustic Model.This paper uses the above three acoustic models including N-gram language model and decoder to construct a mandarin automatic speech recognition system,then we test and verify the Sytem on Chinese speech corpus AISHELL.The experimental results show that the word error rate of time delay neural network speech recognition system is 7.51%,which is 7.58% lower than the traditional GMM-HMM speech recognition system.TDNN-LSTM speech recognition system has same recognition effect as time delay neural network speech recognition system,but the training time is 16% less than it.Spectrum augment speech recognition system further reduces the word error rate of the TDNN-LSTM system by 1.42%.Based on the above research results,this paper designs and implements a browser/server(B/S)mode Mandarin speech online recognition system,and experiments are further conducted to testify the practicability of the proposed speech recognition model.
Keywords/Search Tags:Speech recognition system, Acoustic model, Time delay neural network, Long short-term memory network, Spectrum augment
PDF Full Text Request
Related items