Research And Implementation Of End-to-end Speech Recognition System Based On CTC Method

Posted on:2020-11-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y Lu

Full Text:PDF

GTID:2438330572487381

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Automatic speech recognition technology is the key technology to make people and people,machines and machines communicate more smoothly.With the gradual popularization of new social media,the amount of data on the Internet has increased significantly,which has greatly reduced the recognition efficiency of traditional speech recognition systems.In the traditional speech recognition method,in addition to the specific text,it is necessary to label the phonemes corresponding to the chronological order when training the corpus of the model,which requires a large amount of labor costs.Therefore,speech recognition can be made simple using neural network technology.The probability of multiple tag sequences is calculated by Connectionist Temporal Classification(CTC),which is a collection of all possible corresponding words in a speech sample.Since the audio sequence is directly used to correspond to the text,even the language model can be omitted,thus eliminating the standard language model and acoustic model,which will make the speech recognition technology independent of the language,as long as the sample is enough,it can be trained.This paper focuses on the end-to-end speech recognition system based on the connection time classification method.The main research contents include:1)In-depth study of the LSTM structure,improve the network structure of LSTMP,and propose a Re-dimension method,which allows the network to learn historical information autonomously,and through experimental verification,the accuracy of speech recognition is improved.2)Since the Batch Normalization(BN)algorithm used to be used on the DNN model,the BN algorithm is used to make it work on the LSTM network.3)When performing neural network training,the Target Delay method is used to realize the adaptive CTC algorithm,so that the Context of the unidirectional LSTM model is accurately modeled.In summary,the experiment is carried out on the collected data set.The experimental results show that the end-to-end speech recognition based on CTC method can improve the recognition efficiency.With the increasing amount of data,it will surpass the performance of traditional speech recognition system.

Keywords/Search Tags:

Speech Recognition, End-to-end, CTC, Batch Normalization, Target Delay

PDF Full Text Request

Related items

1	Research On Acoustic Target Recognition Method Based On Deep Learning
2	Duration normalization for robust recognition of spontaneous speech via missing feature methods
3	Feature Extraction Method Of Noisy Speech Based On EMD And Feature Normalization
4	Research On The Voiceprint Recognition Algorithm Based On Improved Time Delayed Neural Network In Noisy Environment
5	Time Delay Neural Network Based Automatic Speech Recognition
6	A Study Of An Irrelevant Variability Normalization Based Large Vocabulary Continuous Speech Recognition
7	Research On I-vector Based Speaker Normalization For Speech Recognition
8	Acoustic modeling and speaker normalization strategies with application to robust in-vehicle speech recognition and dialect classification
9	Neural dynamics of speech perception and production: From speaker normalization to apraxia of speech
10	A Study Of Several Problems On Noise Robust Speech Recognition