Font Size: a A A

Research Of End-to-End Voice Wake-up

Posted on:2020-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhangFull Text:PDF
GTID:2428330602468133Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence and the increasing demand for human-computer interaction,intelligent speech technology has achieved unprecedented breakthroughs.The research results in the field of intelligent speech not only promote the advancement of cutting-edge technology,but also create huge market value.Therefore,speech technology is of great significance to us.Voice wake-up is an important research direction in the field of intelligent speech.Its task is to identify a given set of wake-up words in a continuous stream of speech.For the voice wake-up task with enrollment utterances,this study establishes a deep supervector based voice wake-up system to meet the wake-up requirement.For the task of fixed wake-up word,this study focuses on end-to-end(E2E)technology and implements an end-to-end voice wake-up system.In addition,it optimizes system parameter configuration and improves system performance through the application of various deep learning models.The main work of this study includes:1.Completely organized the main line of the development history of speech recognition,and conducted a thorough and detailed investigation on the predecessors'work,research status and latest developments of voice wake-up and end-to-end technology.2.A voice wake-up system based on deep supervector is established for the voice wake-up task with enrollment utterances.The system uses DNN as a feature extractor to extract the deep supervectors of the speech,and finally calculates the Cosine similarity between the deep supervectors of test speech and the deep supervectors of the templates.Experimental results show that systems based on deep supervector have comprehensive performanced over systems based on segmental DTW(S-DTW).3.This study also implements an end-to-end voice wake-up system.The system only needs a pre-trained neural network as an acoustic model.After feeding the acoustic features,the forward propagation algorithm and the posterior probability post-processing module of the neural network can output the confidence score of the wake-up word in the process.An end-to-end framework is implemented without the need for a complicated decoding process.Moreover,this paper introduces various deep learning models including TDNN,LSTM,GRU and TDNN-F as acoustic models into the system.Through multiple experiments,the system performance of each model is fully compared,and the experiments verified the effectiveness of the end-to-end wake-up system.
Keywords/Search Tags:Voice Wake-up, Deep Supervector, End-to-End, TDNN, GRU, TDNN-F
PDF Full Text Request
Related items