Font Size: a A A

Research On Embedded Speech Recognition System Based On Deep Learning

Posted on:2020-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:M H YangFull Text:PDF
GTID:2428330578957998Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the research progress of deep learning technology,speech recognition has also completed the transition from traditional model to deep learning.The main purpose of this paper is to solve the problem of off-line speech recognition of mobile terminal and improve the accuracy of speech recognition.This thesis adopts the method of deep learning,the model trained on the computer is transplanted to Raspberry Pi 3b+ for voice recognition.The overall structure of the project can be divided into two parts: acoustic model and language model.The acoustic model is trained by the optimized DFCNN(Deep Fully Convolutional Neural Network).The sound signal is converted into a Spectrogram and then input the optimized DFCNN model,After training,the input speech signal can be converted into pinyin.The language model is built and trained by using the encoder part of Transformer which is constructed by Google for English-German translation,The purpose of the language model is to complete the conversion between pinyin and Chinese characters.Relevant work is also done for the above model:1.Using the deep learning framework Tensorflow to construct the optimized DFCNN model and the encoder part of Transformer to complete training.At the same time,the quantization system of Tensorflow is used to quantify the model.It is transplanted on the embedded platform with Linux system,the name of the embedded platform is Raspberry Pi 3b+.The role of the system is to achieve speech recognition.2.In order to make the sample rich enough,the open source THCHS30 audio library of tsinghua university is selected for training.For the acoustic model,the traditional methods of extracting feature values such as MFCC and LPCC are discarded,and feature values are extracted by convolutional neural network in a similar way to image recognition.Python is used to convert the speech signal into Spectrogram,the generated spectrogram is taken as the input data of the acoustic model.At the same time,the trained Bi-LSTM model is mainly compared,the recognition speed and performance of the two models on the computer and Raspberry Pi 3b+ are tested and analyzed.3.The language model is modeled by the encoder part of Transformer,the processed pinyin and Chinese characters are input into the model for training.the corresponding ID lists are extracted from both the pinyin and the Chinese character.The ID lists extracted from the dictionary,filled in and input the model.In the test phase,the performance and speed of traditional n-gram model are compared,and the advantages and disadvantages of the encoder part of Transformer are analyzed.4.Raspberry Pi 3b+ is equipped with ReSpeaker 2-Mics Pi Hat related software and hardware,which can collect the voice signal of the speaker.This hardware can filter out certain noise,further improve the signal-to-noise ratio of voice signal and improve the recognition rate by preprocessing.Comparing DFCNN and the encoder part of Transformer with other mainstream models by using training set and voice signals collected through students who are in the lab,the conclusion is DFCNN and the encoder part of Transformer are suitable for transplanting on the embedded platform,the recognition effect and speed reached the expected level.In reality,a satisfactory recognition rate and recognition speed is achieved.
Keywords/Search Tags:deep learning, embedded platform, speech recognition, acoustic model, language model
PDF Full Text Request
Related items