Font Size: a A A

Research On Speech Recognition Of Mobile Phone Based On Deep Neural Network

Posted on:2021-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:K LinFull Text:PDF
GTID:2518306569490594Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
To reduce human key-in cost,speech recognition has always been the best way for humans to communicate with machines.However,public information shows that although the accuracy of speech recognition in a quiet environment,e.g.,a living room is close to that of real-world environment,speech recognition for mobile application is still not satisfying.The possible reason is that the environment of mobile device is very complicated.Thus,the accuracy of speech recognition algorithm is seriously deteriorated.To address this issue,this thesis aims to develop the front-end noise reduction algorithm,voice activity detection algorithm and keyword spotting algorithm integrated with the speech recognition system,and optimize the corresponding algorithms based on deep neural network models.By considering the advantages and disadvantages of both traditional digital signal processing methods and deep neural network methods,this thesis first employs deep neural networks to classify different types of noises with different combinations of parameters in an offline environment.According to the classification accuracy,we empirically choose the best parameter set to initialize the later algorithm,i.e.,Wiener filtering algorithm.This,could guarantee the Wiener filtering algorithm having a stronger adaptability and could speed up the convergence of the algorithm which in turn improves the noise reduction performance.In addition,a voice activity detection algorithm based on a hybrid convolutional neural network(CNN)and long short-term memory network(LSTM)is proposed,which takes advantage of the feature extraction ability of the convolutional neural network to extract the input feature vector,and then the extracted high-dimensionality feature vectors are fed into the long short-term memory network to extract the time dependency relationship between speech frames.This proposed CNN-LSTM-DNN network combines the advantages of different neural network so that improves the robustness of the voice activity detection algorithm under low signal-to-noise ratio.Compared with either CNN or LSTM,the CNN-LSTM-DNN achieves the best performance.Finally,this thesis studies the use of multi-head attention in keyword spotting tasks and proposes an orthogonally constrained multi-head attention mechanism.Regularization is derived from the constraints of the context and score vector between attention heads so that they are orthogonal to each other,respectively.Regularization by the orthogonality between the heads of the context vector and the score vector makes the attention heads less redundant with each other,while regularization through the non-orthogonality of the head of the context vector makes them consistent between samples for a given task.The results show that the proposed regularization technique improves the keyword detection performance by reducing the false rejection rate and only a small increase in the model size.
Keywords/Search Tags:noise reduction algorithm, VAD algorithm, neural networks, keyword detection, speech recognition
PDF Full Text Request
Related items