Font Size: a A A

Research On Spoken Term Detection Technology In Continuous Speech Based On Sample Template

Posted on:2022-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:2518306353477174Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Spoken term detection technology can search and analyze a large number of speech and audio data,and provide people audio data which they want,thus reduce the time cost of human listening and screening speech and audio,and greatly improve the intelligent level of the machine.However,in the context of low resources,unlimited languages and arbitrary keyword,spoken term detection system based on the traditional large vocabulary speech recognition LVCSR and HMM model will fail.The spoken term detection technology based on sample template query has great research value in the context of this demand.This method only needs a few sample templates of the speech segment to be detected,and the speech segment can be retrieved from a large amount of audio data.Based on this background,this thesis mainly studies the spoken term detection technology in continuous speech based on sample template.Firstly,this thesis introduces related technologies for preprocessing and feature extraction related to speech signals,it introduces three speech acoustic features of MFCC,PLP,bottleneck respectively,and introduces the speaker normalization technology to reduce the speaker difference.Then based on the requirements of this thesis,a self-made spoken term detection dataset is used for subsequent spoken term detection technology research.Secondly,this thesis studies the unsupervised sample template spoken term detection technology in continuous speech based on DTW,aiming at defects of original DTW detection algorithm,an improved algorithm based on DTW distance analysis is adopted.A parallel acceleration detection strategy of male and female dual templates is proposed.Meanwhile,the sliding strategy is improved in keyword template matching process with sliding,and the threshold based on distance statistics is used for spoken term detection in the keyword detection stage.Finally,the Python language is used to implement the relevant algorithm,and gradually analyze the influence of detection parameters,speech features,template selection method on the detection results of the algorithm through experiments.Under the joint action of the best detection parameters,bottleneck features and the parallel accelerated matching strategy of male and female dual templates proposed in this thesis,and the keyword recall rate is achieved at76.58% on the self-made dataset.It verifies the superiority of the unsupervised sample template query method based on DTW distance analysis for keyword recall in the context of requirements.Finally,this thesis studies the supervised sample template spoken term detection technology in continuous speech based on CNN.By taking the distance matrix between the speech keyword and the audio as an image,and sending it to the VGG16 image classification network for learning,the final keyword classification model is obtained.At the same time,this thesis focuses on the generalization ability of the method in different languages,and adopts the training and testing strategy of language mismatch between train set and test set,and finally uses Python language to implement related algorithms.Experiments show that the supervised spoken term detection algorithm has certain generalization ability in different languages,and in self-made keyword dataset,when the classification threshold is 0.2,the keyword recall rate is 89.19%,the precision rate is 64.71%.the threshold is 0.3,the recall rate of keyword is 74.77%,and the precision rate is 83.00%.It is verified that the supervised CNN sample template query method is universal and superior in precision in the context of requirements.
Keywords/Search Tags:Spoken term detection, Sample template, Speech feature, Dynamic time warping, CNN
PDF Full Text Request
Related items