Font Size: a A A

Research On Keyword Spotting Technology Based On Neural Network

Posted on:2022-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:G ChenFull Text:PDF
GTID:2518306764480264Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
As one of the research hotspots in the field of ASR,keyword spotting algorithm detects specific speech content in audio,which reduces labor costs and improves the intelligence level of computers,so it has been widely used.Although the current mainstream keyword spotting algorithms have achieved extremely high detection accuracy on the predefined keywords,under the constraints of the low resource corpus,supporting for custom keywords and keyword localization,the detection performance of the above algorithms is greatly reduced.In order to solve the problems,the paper improves the query-by-example based keyword spotting technology.The main work of the paper is as follows:Firstly,this paper proposes an audio keyword localization method based on the slash feature to address the defect,which shows that the existing audio keyword spotting algorithms cannot locate custom keywords.By studying the common characteristics of the positive example pairs of the keyword audio and the audio to be detected,the paper finds that the presence of the keyword in the feature matrix of the audio to be detected is equivalent to the slash feature.Based on the rule,the paper improves the OSTU algorithm to detect and locate keywords.Using the adaptive threshold method to binarize the feature matrix,the improved algorithm finds the real oblique area of the feature matrix through the keyword location algorithm based on the connected region.According to the experimen-tal tests,on the custom dataset,the speech keyword location method proposed in the paper is more accurate than the DTW algorithm with location function.The main performance is that the Io U value is increased by 0.14,and the hit rate is increased by 0.12.Secondly,this paper proposes an end-to-end audio keyword spotting method to make up for deficiencies,which shows that the audio keyword spotting algorithm is not suitable for low-resource corpus and does not support custom keywords.Based on the multi-task learning method,the paper designs a multilingual bottleneck feature network to extract the features of the low-resource corpus.Then,this paper uses the feature matrix gener-ation module to combine the multilingual bottleneck feature extraction network with the keyword detection network.What's more,it uses the end-to-end method to reduce the systematic errors between different modules,and finally improves the detection capabil-ity of the system.The experimental results show that the results of the detection model proposed by this paper are more accurate.On the QUESST2014 dataset,the end-to-end keyword spotting model reduces the Cminnxe metric by 0.03 compared to the current state-of-the-art query-by-example based keyword spotting model.In addition,the endto-end keyword spotting model has a low need of keyword examples,which is suitable for low-resource corpus scenarios.Thirdly,in this paper,an online audio keyword spotting system with keyword detection and keyword localization is implemented on the PC platform,and each functional module is tested.The experimental results show that the average time for the system to detect keywords is 10 s.Apart from that,each functional module can run completely,meeting the functional and non-functional requirements.
Keywords/Search Tags:Keyword Spotting, User-defined Keywords, Keyword Localization, Bottleneck Features
PDF Full Text Request
Related items