Font Size: a A A

Air Traffic Control Speech Recognition Based On Deep Learning

Posted on:2020-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:S F ZhangFull Text:PDF
GTID:2428330572982437Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the engulfment of deep learning,deep speech recognition technology has gradually replaced the traditional speech recognition model based on GMM-HMM and has become the mainstream in the field of speech recognition.The Air Traffic Control(ATC)speech is the main form of communication between air traffic controllers and pilots.The ATC speech recognition plays an important role in the traffic control system and the real-time monitoring system which is for the talk between air and land.We have carried out series of researches on the method of ATC speech recognition.First of all,we design and implement the ATC speech recognition system with only a few annotations.Considering the characteristcs of insufficient samples,we use syllables as the modeling unit,and construct the acoustic model based on BLSTM(Bidirectional Long Short Term Memory)+CTC(Connectionist Temporal Classification).Then,we utilize Transformer as a language model which is to convert syllable to word.Experiment has shown that the system achieves an acceptable recognition effect.What's more,for the ATC speech data with a large number of annotations,we design and implement two acoustic models,which are FC-N-BLSTM+CTC(Fully Connected layer,FC,N refers to the number of FC layer)and DFCNN+CTC(Deep Fully Convolutional Neural Network)respectively.FC-3-BLSTM+CTC achie-ved a 9.6%character error rate,but the training time and decoding time of BLSTM method are relatively long.The character error rate of DFCNN+CTC was 0.5%higher than that of BLSTM+CTC,but its training time and decoding time are better than BLSTM+CTC,which avoids the problem of training and decoding time-consuming caused by BLSTM.Finally,we complete the bad label selection task of X-ATC based on FC-3-BLSTM+CTC and DFCNN+CTC,which is in order to solve the time-consuming problem of manually selecting bad samples.Then we use the corrected data to retrain the acoustic models,and finally a better result is obtained.
Keywords/Search Tags:ATC speech recognition, Deep Learning, Acoustic model, Language model
PDF Full Text Request
Related items