Font Size: a A A

Research And Application Of Speech Recognition Based On Conformer

Posted on:2024-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:J K LuFull Text:PDF
GTID:2568307076976609Subject:Engineering
Abstract/Summary:PDF Full Text Request
As one of the key technologies for human-computer interaction,speech recognition has been widely used in the fields of voice input,voice search and intelligent voice assistant,and its recognition performance is closely related to user experience.The emergence of end-to-end speech recognition models in recent years has improved the tedious recognition process of traditional models,lowered the threshold in the field of speech recognition,and gradually become a research hotspot.However,such models still have problems such as low recognition performance and harsh requirements for terminal devices.In this thesis,around the shortcomings of end-to-end speech recognition models,the following work is carried out based on the Conformer model:(1)To address the shortcomings of Transformer-based end-to-end speech recognition models in terms of insufficient ability to capture local fine features and weak language modeling ability,an architecture based on Conformer model combined with N-gram speech model is proposed,which consists of Conformer encoder and CTC WFST search decoder.It not only enhances the ability of the model to extract local subtle features,but also improves the linguistic modeling ability of the model.Experiments are conducted on dataset AISHELL-1,dataset aidatatang_200zh and dataset containing noise,and the results show that the method can effectively improve the accuracy of model recognition with certain advancement and has some applicability in noisy environments.(2)To address the limitations of CTC-based end-to-end speech recognition model which is difficult to utilize the contextual connection of speech features and the shortcomings of attentionbased end-to-end speech recognition model which is too focused on attention,a model structure based on Conformer model using CTC decoder and multi-headed attention decoder combined with N-gram language model is proposed to further reduce CER.experiments are conducted on dataset AISHELL-1 and dataset aidatatang_200zh,and the results show that the method can further improve the recognition accuracy of the model.Comparing with other newer models,it can be seen that the model has better recognition performance.(3)A model compression method based on a combination of structured interval pruning and quantization of the Conformer model is proposed to address the drawback that the Conformerbased end-to-end speech recognition model has high computational complexity and is difficult to use on end devices.Structured interval pruning is performed on the convolutional part of the Conformer module,and model quantization is performed on the linear layer.Experiments are conducted on the dataset AISHELL-1,and the results show that when the pruning ratio is 0.2,the model CER rises very little,and the inference speed is significantly improved,and the model size is also effectively reduced,so that the Conformer model can better meet the needs of practical applications.
Keywords/Search Tags:Automatic speech recognition, End-to-End model, Conformer, Language model, Model compression
PDF Full Text Request
Related items