Research On Speech Recognition Of Mobile Phone Based On Deep Neural Network

Posted on:2021-12-13

Degree:Master

Type:Thesis

Country:China

Candidate:K Lin

Full Text:PDF

GTID:2518306569490594

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

To reduce human key-in cost,speech recognition has always been the best way for humans to communicate with machines.However,public information shows that although the accuracy of speech recognition in a quiet environment,e.g.,a living room is close to that of real-world environment,speech recognition for mobile application is still not satisfying.The possible reason is that the environment of mobile device is very complicated.Thus,the accuracy of speech recognition algorithm is seriously deteriorated.To address this issue,this thesis aims to develop the front-end noise reduction algorithm,voice activity detection algorithm and keyword spotting algorithm integrated with the speech recognition system,and optimize the corresponding algorithms based on deep neural network models.By considering the advantages and disadvantages of both traditional digital signal processing methods and deep neural network methods,this thesis first employs deep neural networks to classify different types of noises with different combinations of parameters in an offline environment.According to the classification accuracy,we empirically choose the best parameter set to initialize the later algorithm,i.e.,Wiener filtering algorithm.This,could guarantee the Wiener filtering algorithm having a stronger adaptability and could speed up the convergence of the algorithm which in turn improves the noise reduction performance.In addition,a voice activity detection algorithm based on a hybrid convolutional neural network(CNN)and long short-term memory network(LSTM)is proposed,which takes advantage of the feature extraction ability of the convolutional neural network to extract the input feature vector,and then the extracted high-dimensionality feature vectors are fed into the long short-term memory network to extract the time dependency relationship between speech frames.This proposed CNN-LSTM-DNN network combines the advantages of different neural network so that improves the robustness of the voice activity detection algorithm under low signal-to-noise ratio.Compared with either CNN or LSTM,the CNN-LSTM-DNN achieves the best performance.Finally,this thesis studies the use of multi-head attention in keyword spotting tasks and proposes an orthogonally constrained multi-head attention mechanism.Regularization is derived from the constraints of the context and score vector between attention heads so that they are orthogonal to each other,respectively.Regularization by the orthogonality between the heads of the context vector and the score vector makes the attention heads less redundant with each other,while regularization through the non-orthogonality of the head of the context vector makes them consistent between samples for a given task.The results show that the proposed regularization technique improves the keyword detection performance by reducing the false rejection rate and only a small increase in the model size.

Keywords/Search Tags:

noise reduction algorithm, VAD algorithm, neural networks, keyword detection, speech recognition

PDF Full Text Request

Related items

1	Neural Network-Based Speech Keyword Recognition Algorithm And Circuit Design For Low Signal-To-Noise Ratio
2	Design And Implementation Of Noise Robust Speech Recognition Algorithm Based On Deep Learning
3	Design And Implementation Of Voice Wake-up System For Voiceprint Recognition Based On Deep Learning
4	Research And Implementation Of Chinese Speech Keyword Recognition Algorithm
5	The Research Of Key Techniques Of Speech Separation And Speech Recognition
6	Research On Human Computer Interaction Based On Speech Keyword Spotting
7	Real-time Noise Reduction For Conference Telephone
8	Study Of Speech Recognition Algorithm Under Noise Environment
9	The Research Of Speech Recognition Base On GA-ACO Algorithm And BP Neural Networks
10	Design Of Low-power Keyword Recognition Feature Extraction Module For High Noise Scence