Research On End-to-end Acoustic Model Of Code-switching Speech Recognition

Posted on:2022-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Zhou

Full Text:PDF

GTID:2518306476996129

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

With the increasing trend of globalization and cultural trade exchanges between countries,multilingualism has become a common phenomenon in daily life.As the gateway to human-computer interaction,most of the existing state-of-the-art speech recognition systems focus on monolingual speech recognition,i.e.,they can only handle one language at a time and cannot recognize code-switching speech.Therefore,it is important to build automatic speech recognition systems for code-switching speech.The DNN-HMM algorithm has become the mainstream framework for acoustic modeling of speech recognition in recent years,but it has some obvious limitations for code-switching speech recognition tasks.First,the conventional DNN-HMM speech recognition model is based on modeling some acoustic units such as pinyin and phoneme.Those acoustic units of different languages are independent of each other and have different acoustic properties.The connection between acoustic attributes of different language cannot be well modeled by independent pronunciation dictionaries of each languages.Secondly,due to the specificity of code-switching speech and sparse training data at the code-switching points,the DNN-HMM model cannot effectively model the acoustic properties at the junction of two languages(code-switching point).Therefore,this paper adopts an end-to-end(E2E)strategy to build and study an end-to-end Chinese-English code-switching speech recognition system based on Transformer framework and joint CTC training.The end-to-end(E2E)model is completely based on the unified modeling of neural networks,eliminating the modules of dictionary,acoustic model,and language model in DNN-HMM,which can optimize the overall input to output.Moreover,the end-to-end model is usually based on character modeling,and the modeling unit no longer corresponds to the acoustic unit one by one,which can blur the association between the modeling unit and acoustic attributes,enabling the network to automatically balance the similarity and distinction between different language speech.Also,because the end-to-end model is free from the independence assumption,it is able to learn the acoustic properties at the codeswitching point.Further,this paper also innovatively proposes two Transformer-based structures:(1)Propose an acoustic modeling algorithm based on the Transformer framework with self-and-mixed attention mechanisms.(2)In order to better explore the acoustic commonality and distinction of between Chinese and English languages,a "multi-encoder-decoder Transformer" structure is proposed in this paper.The experimental results on SEAME dataset demonstrate that the two acoustic modeling algorithms proposed in this paper have significantly improved the recognition performance compared with the baseline standard Transformer model and the baseline DNN-HMM model.

Keywords/Search Tags:

End-to-end, Speech Recognition, Code-switching Speech, Transformer, Attention mechanism

PDF Full Text Request

Related items

1	Research On Speech Recognition Based On Transformer
2	End-to-end Dialect Speech Recognition Based On Weighted Sparse Attention Mechanism
3	Research On Speech Emotion Recognition For The Elderly
4	Research On End-to-End Speech Recognition Method Based On Self-Attention Mechanism
5	Study On Speech Keyword Spotting Methods Based On Deep Learning
6	Research On Speech Emotion Recognition Technology Based On Attention Mechanism
7	Research On Speech Emotion Recognition Technology Based On Deep Learning
8	Speech Emotion Recognition Based On Attention Mechanism
9	Research And Application Of Attention-based Mandarin Speech Recognition
10	Research On Speech Signal Recognition Based On Deep Two-way GRU And Attention Mechanism