Font Size: a A A

Design And Implementation Of Embedded Speech Recognition System Based On Deep Learning

Posted on:2022-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhuangFull Text:PDF
GTID:2518306524480264Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence technology,speech is not only the medium of human communication,but also facilitates the performance of humancomputer interaction.Recently,with the rapid development of speech recognition technology,it has begun to be gradually applied to various fields.Deep learning has made a qualitative leap in the accuracy of speech recognition.The increasingly complex network model is difficult to apply to embedded devices.And in actual speech scenes,there are always various noises,such as environmental noise,equipment noise,engine noise,etc.,which will affect the performance of speech recognition.In the case of ensuring the accuracy of speech recognition,how to compress the model to the most suitable for embedded devices has become a problem for many scholars.This thesis designs a lightweight and end-to end Chinese speech recognition model based on deep learning,which is transplanted to embedded devices for testing.The specific work is as follows:1.To address the problem that there are not many open source datasets for speech and the real scene has noise in the speech environment,this thesis collects and organizes Chinese open source datasets into Large-Dataset,and designs a noise suppression algorithm by integrating deep learning methods into traditional signal processing methods,which can reduce the character error rate by 1.48% when tested on noisy datasets.2.To address the problem that speech recognition models are generally large,this thesis investigates an end-to-end speech recognition scheme that uses convolutional kernels as the core of the backbone network,solves the problem of long-distance dependence by GCN,designs a fully convolutional lightweight neural network,and uses CTC to solve the problem of unequal length of input and output for automatic alignment.3.To address the problem of extremely unbalanced distribution of Chinese character samples,this thesis combines the idea of Focal Loss with CTC Loss,so that it has different attention to Chinese character samples with different distributions and reduces the impact of unbalanced samples on speech recognition accuracy,and obtains a reduced character error rate of 0.85%.4.To address the problems of small memory and insufficient computational power in embedded environment,this thesis uses 8Bit weight quantization technique to compress the model to nearly one-fourth of the original one.At the same time,a shift quantization acceleration scheme is designed to optimize the model weights after 8Bit quantization by designing a suitable codebook,converting a large number of convolutional multiplication operations into a shift-and-sum mode,and increasing the inference speed of the model on the embedded system by 40% times with a loss of 0.6%character error rate.
Keywords/Search Tags:speech recognition, noise suppression, lightweight neural network, shift quantization, weight quantization
PDF Full Text Request
Related items