Font Size: a A A

Research On Fast Query-by-example Spoken Term Detection Based On Template Matching

Posted on:2014-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y FengFull Text:PDF
GTID:2268330401476841Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Query-by-Example spoken term detection (QbE STD) is to search in the mass speechresource and return relevant spoken segments according to the users’ query examples, whichplays an important role in information security, speech search engine and management of speechresource. QbE STD based on template matching is one of the mainstream methods of QbE STD.However, using the method directly results in heavy computation load and lack of considerationsfor the acoustic variations of the query examples. This thesis mainly focuses on reducing timeconsumption and re-ranking of relevant regions, aiming to find effective ways to acceleratedetection speed and precision. The main content includes:A piecewise aggregate approximation(PAA) lower-bound estimate(LBE) for dynamic timewarping(DTW) is proposed to settle the problem that there is a high time consumption whenusing DTW directly to search for relevant regions. The basic idea behind the method is reducingthe number of DTW computation to accelerate detection speed. According to this method, thepiecewise aggregate approximation lower-bound estimates are quickly computed between thequery example and every possible matching region in the corpus of utterances. Then, the Knearest nighbor(KNN) and DTW are chosen to search for the relevant regions. Experimentalresults show that the detection speed of the new method is5.9times as fast as applying DTWdirectly, and there is no effect on the detection precision when compared with the latter.The segmental dynamic time warping(SDTW) is proposed to solve the problem of theredundant computation and matching of DTW. In this method, test utterances are divided a seriesof matching regions, then DTW is formulated to implement QbE STD. To further improve thedetection speed, we take LBE in QbE STD based SDTW. Finally, pseudo relevance feedback(PRF) is used to solve the problem of lack of considerations for the acoustic variations of thequery examples. The experimental results indicate the detection speed of the new method is14.6times as fast as applying DTW directly, and the detection precision improves5.21%whencompared with the latter.A method of QbE STD based on phoneme boundary information and DTW is proposed,aiming to overcome the limitation of the lower-bound estimate for DTW and the lower-boundestimate for SDTW. In this method, the phoneme posterior probabilities of query examples andtest materials are segmented into segment sequences using hierarchical agglomerative clustering(HAC) algorithm firstly, new queries and new indexes can be composed of the expectationvectors of the segment sequences, then DTW is formulated to implement QbE STD. Finally, thedetection result can be modified by PRF. The experimental results indicate the detection speed of the new method is faster14.4times than applying DTW directly, and the detection precisionimproves0.73%when compared with the latter.
Keywords/Search Tags:Query-by-Example spoken term detection, Phoneme posterior probabilities, Dynamic time warping, Segmental dynamic time warping, Lower-bound estimate, Piecewiseaggregate approximation, Pseudo relevance feedback, Hierarchical agglomerative clustering
PDF Full Text Request
Related items