Font Size: a A A

The Research On Speech Recognition And Interactive Applications Based On Deep Learning

Posted on:2018-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y S WangFull Text:PDF
GTID:2348330542956587Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Voice signal is the most common means of daily communication because of containing rich information and easy to use.Voice interaction uses voice communication to form a new way of interacting with the machine,making the machine more convenient to use.Speech recognition is the most basic and core part of voice interaction,and high recognition accuracy is the guarantee of accurate interaction.With the improvement of application scene complexity,the traditional speech recognition methods lack the ability to model large data corpus.Deep Learning is very effective in dealing with big data problems,and it can be used in speech recognition,which can improve the recognition accuracy of massive speech.A speech recognition algorithm is constructed based on depth learning,and the execution of commands for interactive speech is studied.In traditional speech recognition system,GMM-HMM acoustic model is a shallow model.its modeling ability is insufficient when the corpus of speech is increased.The deep learning model contains multi-layer nonlinear computation,which can better fit nonlinear functions.Constructing DNN-HMM acoustic model can improve the recognition accuracy,and the recognition system based on DNN-HMM has higher recognition accuracy after training.The training of acoustic model needs to annotate the speech frames.This step has many workloads and needs expert experience,which can not meet the needs of mass data.Using recurrent neural network to process speech sequence signals and combining the CTC layer as the output layer of the model,the LSTM-CTC model can take advantage of the dependencies in the speech sequence,and the output is no longer requiring manual tagging.It shows that multi-layer LSTM network training has large amount of calculation,long training time and difficult to converge in practice.According to the characteristics of extracting the distribution features from the generative model,and combining the sequence features of the speech information,a speech recognition model based on the generative model and CTC is proposed.In speech interaction,it is an important index of performance of speech interaction system to extract commands and perform operations accurately from speech information.Research on keyword extraction algorithm to extract commands from the text,The paper compares and analyzes 4 kinds of keyword extraction algorithms,RAKE algorithm can be concise and effective completion of keyword extraction.The identification model is constructed based on TensorFlow,and the final word error rate of model test is 7.16%,close to 4.58%of human level.With the code implementation,the algorithm is used in the voice interaction of simple trolley console,and the results show that the algorithm can accurately analyze phrase directives and perform corresponding operations through the interface function.
Keywords/Search Tags:Speech recognition, Speech Interaction, Deep learning, End-to-end model
PDF Full Text Request
Related items