Research On Chinese Speech Recognition Technology Based On BPE And Transformer

Posted on:2020-02-06

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Luan

Full Text:PDF

GTID:2428330590474474

Subject:Cyberspace security

Abstract/Summary:

PDF Full Text Request

Speech recognition is a precondition for realizing human-computer voice interaction,and it is receiving more and more attention from researchers.End-to-end acoustic model modeling based on Connectionist Temporal Classification(CTC)has become one of the mainstream methods,but choosing the basic output unit for CTC prediction is a design challenge.The choice of recognition unit is generally based on the knowledge of phonetics,but it can also be generated in a data-driven manner.The unit determined by the latter may not have a clear meaning in phonetics,but it may also achieve good performance..In addition,speech recognition systems often include language models,and the n-gram language model is often used in traditional modeling methods.With the development of deep learning,it is also of great research value to find an optimized strategy or network structure to improve the language model.In this context,this paper explores the acoustic model modeling technology and language model modeling technology in the automatic speech recognition system.On the one hand,it proposes a new set of modeling units in combination with CTC theory,and explores the neural network structure of the new language model.Improve the overall performance of speech recognition.Firstly,this paper uses the idea of Byte Pair Encoding(BPE)algorithm to improve the acoustic model and improve the speech recognition performance by selecting more suitable recognition units.The CTC acoustic model can select output units larger than the phonemes,such as vowels and syllables,without labeling each frame of the input speech signal.The BPE algorithm automatically learns and discovers the best set of recognition units by iteratively merging the most frequently occurring elements in the text and adding them to the set of sub-word units,and automatically learns the most appropriate way to decompose the target sequence.In addition,this paper uses the Transformer network to realize the decoding process from the syllable sequence output to the text output from the acoustic model.Compared with the n-gram model,the Transformer network is more likely to capture the interdependent features of long distances in sentences,so that it can make full use of context information and play a greater advantage in the conversion of sound words.Through experimental comparison,the performance of the improved language model system has been improved.And compared to the Recurrent Neural Network(RNN),Transformer has a direct effect on increasing the parallelism of computation,suitable for language model modeling tasks.The combination of BPE-based acoustic modeling and Transformer-based language model modeling technology has significantly improved the performance of Chinese recognition accuracy.

Keywords/Search Tags:

speech recognition, BPE, CTC, Transformer

PDF Full Text Request

Related items

1	Research On Continuous Speech Recognition System Based On Transformer
2	Research On Chinese Speech Recognition Technology Based On BPE And Transformer
3	Research On Amdo Tibetan Speech Recognition Technology Based On MRDCNN?CTC?Transformer Transformer
4	Mandarin Automatic Speech Recognition Based On Transformer
5	Research On End-to-end Acoustic Model Of Code-switching Speech Recognition
6	Design Of Speech Recognition Algorithm For Human Computer Interaction In Machine Operation Environment
7	The Research On Speech Emotion Recognition Based On Contextual Position Enhancement And Weighted Space
8	Research On Abstractive Short Text Automatic Summarization Method In Speech Recognition Scenarios
9	Research On Speech Emotion Recognition Based On Spatiotemporal Feature Fusion
10	Research On End-to-End Simultaneous Speech Translation Based Transformer Transducer