Font Size: a A A

Speech Keyword Matching Model Based On Deep Learning

Posted on:2019-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WuFull Text:PDF
GTID:2428330566997942Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Voice is the most basic and effective way for us to communicate in everyday life.It's our wise to make computers complete speech recognition automatically.With the development of computer technology,internet and artificial intelligence,a large amount of data like audio has been generated in the internet,which contributed a lot to the development of speech recognition.Nowadays,keyword voice wakeup technology has a lot of demands.For example,the intelligent personal assistant represented by Siri and the Amazon Echo smart speaker need to be activated by voice using speech keyword matching technology.The traditional voice keyword matching is mostly based on traditional acoustic models,represented by Hidden Markov Models,Gaussian Mixture Models,and so on.Nowadays,neural network models represented by deep learning are widely used in speech recognition.This paper focuses on the problem of speech keyword matching,and studies traditional speech matching techniques and speech signal extraction,along with deep learning and similarity matching algorithms.The main research contents of this article is divided into the following aspects:Research on speech keyword matching model based on speech recognition.This topic is based on the LSTM model and uses the CTC algorithm instead of the traditional MSE loss function to train the model more effectively.In the LSTM+CTC framework,the input feature of the model is the mel-frequency cepstral coefficients,then speech is identified to text string by the long short-term memory network and the fullyconnected network,and then the similarity algorithm is used to compare the two identified strings.Get matching results.Research on end-to-end speech keyword matching model research.The end-to-end model is different from the speech recognition model.Instead of converting the speech into text,the feature extraction network is trained to obtain the feature map.Then the similarity between the speech is compared through the matching network.The feature extraction network is organically combined with CLDNN and Siamese neural network,and the model input spectrogram is used as a feature.When the amount of model parameters is small,the performance of speech keyword matching is good compared with CNN and LSTM.Model method horizontal evaluation.Through experiments,it is proved that the model based on speech recognition using fuzzy matching and end-to-end keyword matching model performs well compared to the commonly used CNN,LSTM,and CLDNN models.Speech keyword matching is sensitive to negative cases in practical applications.When the model based on speech recognition uses exact matching,it can achieve a recall rate of 100%,although the effect of the positive example is not ideal.The end-to-end keyword matching model follows.Changes in the threshold can maintain a recall rate of about 95%.This shows that the model can meet the requirements of practical applications.
Keywords/Search Tags:speech keyword matching, deep learning, end-to-end, spectrogram
PDF Full Text Request
Related items