Font Size: a A A

Research On Lightweight Speech Recognition Technology In Noise Environment

Posted on:2022-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:J W XueFull Text:PDF
GTID:2518306779494654Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
In recent years,the application demand of speech recognition technology in different scenarios has increased sharply,and its practicability in complex scenarios has received common attention from academia and industry.In a variety of scenarios,such as speech recognition in vehicle environments,speech recognition on terminal devices in noisy environments,and speech recognition on Io T devices in factory environments.In these scenarios,due to the influence of environmental noise,the recognition performance of the speech recognition system is reduced,and the robustness of the system in different scenarios cannot be guaranteed.On the other hand,traditional speech recognition models are complex in structure and consist of Hidden Markov(GMM-HMM)based acoustic models(AM),language models(LM),dictionaries and decoders.Such speech recognition technologies are often based on statistical models,and the model structure contains multiple objective functions.Therefore,under multi-objective conditions,it is not easy to learn the optimal weights suitable for the overall model during the training process.At the same time,the existing speech recognition model still has the problem of a huge amount of parameters and is not easy to deploy on resource-constrained terminal devices.Based on the above problems,this thesis proposes corresponding optimization methods from the aspects of speech feature optimization and model design.Specifically,in terms of speech features,a speech feature enhancement algorithm based on sparse representation is proposed to deal with the robustness of speech recognition systems in different scenarios;in terms of model design,a simple structure,lightweight model,Easy-todeploy lightweight end-to-end speech recognition model.The specific work done in this thesis is as follows:(1)In order to solve the problem of interference caused by background noise or self-noise in speech data Y,this thesis proposes a speech feature enhancement algorithm based on sparse representation by combining the knowledge in the field of compressed sensing.Due to the characteristics of data signals,such as speech signals,image signals,etc.,can be sparse,while noise signals cannot be sparse.Therefore,the features that carry useful information can be separated from the noise features through the sparse representation of the speech features,while retaining the essential characteristics of the speech signal.This thesis discusses the performance difference between the proposed speech feature enhancement algorithm and other traditional data enhancement algorithms.The experimental results show that the proposed algorithm is better than the traditional data enhancement algorithms such as wiener filtering algorithm and spectral subtraction,and can be used in various noise scenarios.The recognition performance of the speech recognition system is improved.At the same time,we also discuss the performance of the algorithm under the multimodal speech recognition system,and the feature enhancement algorithm still has advantages under the multimodal speech recognition system.(2)This thesis proposes a lightweight end-to-end speech recognition model CNN1D-CTC,which consists of a one-dimensional convolutional neural network(Convolutional Neural Networks)and a CTC(Connectionist Temporal Classification)classification algorithm.The model has a simple structure,less parameters than other models,and the model occupies less memory resources,so it is easy to deploy on small terminal devices with limited resources.At the same time,the CNN1D-CTC model is an end-to-end speech recognition model,which can adaptively align speech data and labels at the frame level through neural network learning.Therefore,the CNN1D-CTC model proposed in this thesis reduces the amount of model parameters while achieving higher recognition performance,and has lightweight features.In addition,based on the end-to-end speech recognition model,the system performance with different features as input is explored.Experiments show that the recognition performance of multi-modal features as input is better than that of single-modal features.
Keywords/Search Tags:Sparse representation, Speech feature enhancement, End-to-end Speech recognition, multimodal features
PDF Full Text Request
Related items