Research And Application Of Speech Recognition Based On Conformer

Posted on:2024-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:J K Lu

Full Text:PDF

GTID:2568307076976609

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

As one of the key technologies for human-computer interaction,speech recognition has been widely used in the fields of voice input,voice search and intelligent voice assistant,and its recognition performance is closely related to user experience.The emergence of end-to-end speech recognition models in recent years has improved the tedious recognition process of traditional models,lowered the threshold in the field of speech recognition,and gradually become a research hotspot.However,such models still have problems such as low recognition performance and harsh requirements for terminal devices.In this thesis,around the shortcomings of end-to-end speech recognition models,the following work is carried out based on the Conformer model:(1)To address the shortcomings of Transformer-based end-to-end speech recognition models in terms of insufficient ability to capture local fine features and weak language modeling ability,an architecture based on Conformer model combined with N-gram speech model is proposed,which consists of Conformer encoder and CTC WFST search decoder.It not only enhances the ability of the model to extract local subtle features,but also improves the linguistic modeling ability of the model.Experiments are conducted on dataset AISHELL-1,dataset aidatatang＿200zh and dataset containing noise,and the results show that the method can effectively improve the accuracy of model recognition with certain advancement and has some applicability in noisy environments.(2)To address the limitations of CTC-based end-to-end speech recognition model which is difficult to utilize the contextual connection of speech features and the shortcomings of attentionbased end-to-end speech recognition model which is too focused on attention,a model structure based on Conformer model using CTC decoder and multi-headed attention decoder combined with N-gram language model is proposed to further reduce CER.experiments are conducted on dataset AISHELL-1 and dataset aidatatang＿200zh,and the results show that the method can further improve the recognition accuracy of the model.Comparing with other newer models,it can be seen that the model has better recognition performance.(3)A model compression method based on a combination of structured interval pruning and quantization of the Conformer model is proposed to address the drawback that the Conformerbased end-to-end speech recognition model has high computational complexity and is difficult to use on end devices.Structured interval pruning is performed on the convolutional part of the Conformer module,and model quantization is performed on the linear layer.Experiments are conducted on the dataset AISHELL-1,and the results show that when the pruning ratio is 0.2,the model CER rises very little,and the inference speed is significantly improved,and the model size is also effectively reduced,so that the Conformer model can better meet the needs of practical applications.

Keywords/Search Tags:

Automatic speech recognition, End-to-End model, Conformer, Language model, Model compression

PDF Full Text Request

Related items

1	Application Research On Statistical Language Model Of Large Vocabulary Continuous Speech Recognition System
2	End-to-End Speech Recognition Model Research And System Construction
3	Researching Of The Mogolian Language Model Based On Speech Recognition
4	Researching And Building Of The Mongolian Large Vocabulary Independent Continuous Speech Recognition System
5	Mongolian Language Model Based On Recurrent Neural Network
6	Chineses Speech Recognition System Based On CLDNN Hybrid Model
7	Study And Improve On The Mongolian Speech Recognition System
8	Research On Statistical Language Model Of Large-Vocobulary Continuous Speech Recognition System
9	A Study On The Extraction Of Speech Depth In Tibetan Language And Its Speech Recognition
10	Research On Tibetan Language Model For Continuous Speech Recognition