Font Size: a A A

Research And Application On Multitask Speech Recognition Algorithm For Accented Dialogue Systems

Posted on:2022-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:J A RenFull Text:PDF
GTID:2518306605972599Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Speech is the most convenient way of communication between individuals.In recent years,the rapid growth of Artificial Intelligence makes it possible for machines to understand the human language.Human-machine interaction with speech has facilitated people's lives greatly.With the gradual evolution of speech technology,the end-to-end speech recognition based on deep learning has become a research hotspot for its simple modeling and high accuracy rate.However,in a real scene,noise and reverberation inevitably exist in a speech.Traditional speech recognition algorithms directly map frames to characters,which is sensitive to data distribution.In real scenes like customer service conversations,some slight disturbances can cause wrong recognition results;moreover,strong dialect accents are often present in conversations,which makes it difficult to recognize for a Mandarin-trained model.Training a model for a specific accent requires massive annotated accent data,which is very difficult to implement.To solve the above problems,this thesis studies the robust Automatic Speech Recognition(ASR)algorithm based on Multitask learning.First,a multitask ASR algorithm is proposed for Mandarin conversational speech recognition scenes,which models characters,Pinyin,and accent embedded in speech.With this algorithm,a multitask ASR model is designed which can perform speech recognition,speech to Pinyin and accents classification simultaneously.To further improve the positive effect of auxiliary task for speech recognition,this thesis exploits neural tensor network to model the correlation of characters and Pinyin,and overcome the situation that Pinyin is correctly recognized while characters are wrongly recognized.In addition,the gradient reversal layer is also introduced to enhance the robustness against different accents,thereby boosting its performance.The experiment results show that the proposed method can perform three tasks simultaneously,i.e.,speech recognition,speech to Pinyin,and accents classification.By modeling the relationship between the auxiliary tasks and the primary task,the performance is further improved with a relative Character Error Rate(CER)reductions of 11.21% and 10.56% in clear speech and customer service conversational speech,respectively.To further promote the performance of ASR model in specific accents domain,this thesis studies the cross-accents ASR transfer learning algorithm based on adversarial training.First,we add regular terms to the transfer of the model by gradient reversal layer,characters domain discriminator and accents domain discriminator to prevent the extracted audio highdimensional features of the model from overfitting to the target accent domain.Therefore,the model can learn the high-dimensional speech feature space shared by different accents domains and improve the generalization ability of the model in the target accents domain.Then,adversarial transfer experiments are conducted with Cantonese dialect accent,Xiang dialect accent,visitor accent and customer service accent as the target domains.Specifically,the first two are closed to the source domain distribution,while others are different from the source domain distribution.The results indicate that the proposed method achieves a relative CER reduction of 6.18%?9.52% in the above four target accent domains,outperforming the traditional domain adaptation algorithm.Finally,we design and implement an intelligent ASR system for customer service dialogue scene based on the aforementioned algorithms.The testing results in practical scenes prove that the system can accurately recognize the speech of customer service staff and visitors with accents.Consequently,the proposed ASR algorithm takes full consideration of characters,Pinyin and accents to improve the performance of the model in conversation scenes with accents,which can be widely used in many fields such as intelligent customer service,intelligent conference and autonomous driving.
Keywords/Search Tags:Automatic Speech Recognition, Multitask Learning, Adversarial Training, Transfer Learning, Accent Robust Speech Recognition
PDF Full Text Request
Related items