Research On Keyword Spotting Based On DMLS

Posted on:2015-12-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Zheng

Full Text:PDF

GTID:2308330482479165

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Keyword spotting refers to finding of all occurrences of the given words in speech data, which is one of the efficient ways to deal with oral speech and realize human-machine intelligent communication. It has broad application prospects. Nowadays, keyword spotting based on dynamic match lattice spotting(DMLS) is one of the mainstream methods. The DMLS method combines the fast performance of lattice spotting with dynamic sequence matching techniques. In lattice searching stage, the minimum edit distance is used to compensate insertion, deletion and substitution errors in phoneme recognizer. It achieves a rapid and accurate keyword spotting. In this paper, research is mainly on the lattice generation, index construction, confidence measure and out-of-vocabulary word detection according to the characteristics the DMLS method. The main work and innovations are as follows:(1) The accuracy of phoneme lattice has a direct impact on the performance of keyword spotting. In order to improve the accuracy of lattice, TRAP features and multilayer perceptron are used to construct a more accurate phoneme lattice generation system. On this basis, we build a baseline system based on DMLS. This system performs a modified viterbi traversal to compile a fixed-length phoneme sequence database(SDB). In the searching stage, a minimum edit distance is used as the confidence score to implement the keyword spotting. Tests show that the proposed method is superior to the systems with MFCC and PLP features and the recall rate increases by about 5%.(2) For the problems that some of the information is lost in SDB construction satge and query term is longer than the length of the index, a method of hybrid index, combining the most probable phoneme sequence and SDB, is proposed. The most probable phoneme sequence is the 1-best full result in speech recognition, which can represent the global optimum result in the entire lattice and form a complementary with SDB. Moreover, the most probable phoneme sequence is not affected by the sequence length N and it can be used to detect query term with a longer sequence of phonemes. Tests show that the hybrid index method compared to single SDB index improves the figure of merit by 1.4% relatively.(3) In the keyword spotting system based on DMLS, the minimum edit distance is used as the confidence measure to implement keyword spotting. While this measure increases detection rate, it also raises the false alarm rate. To address this problem, this paper proposes an approach that integrates the posterior probability confidence measure into DMLS. Firstly, the posterior probability of the lattice is introduced into the index stage of DMLS. Secondly, data-driven phoneme substitution, insertion and deletion costs are incorporated for more flexible phoneme sequence matching. Finally, we blend the minimum edit distance and the posterior probability confidence together to detect all occurrences of the keywords. The experimental results show that there is a certain complementarity between the minimum edit distance and posterior probability confidence measure and the equal error rate achieves 13.3% relative reduction.(4) To address the issue of out-of-vocabulary (OOV) word in keyword spotting, a method incorporating query expansion into dynamic match is proposed. Query expansion and dynamic match are two different ways to compensate the high degree of uncertainty in OOV pronunciation. Considering the potential mutual complementarity between them, this paper presents two methods of fusion. One is result fusion that performs a parallel OOV word detection with query expansion and dynamic match individually and then merges search results of the two systems. Another is confidence fusion which combines the minimum edit distance and the pronunciation score together as a hybrid confidence measure to implement OOV word detection and verification. Tests show that the second fusion method is more efficient and the figure of merit achieves 19.8% relative promotion.

Keywords/Search Tags:

Keyword Spotting, Dynamic Match Lattice Spotting, TRAP Features, Minimum Edit Distance, Most Probable Phoneme Sequence, Posterior Probability Confidence Measure, Query Expansion, Out-of-Vocabulary Word

PDF Full Text Request

Related items

1	Speech Keyword Spotting And Confidence Measure
2	Research On Speech Keyword Spotting Technology For Mongolian
3	Rapid Keyword Spotting In Continuous Speech
4	Research On Confidence Measure In Speech Keyword Spotting
5	Design And Implementation Of Keyword Spotting System
6	Research On Confidence Measure In Speech Keyword Recognition
7	Research On Keyword Spotting Technology Based On Neural Network
8	Research And Implementation Of Keyword Spotting System With Large Keyword Table In Spontaneous Speech
9	Research Of Small Vocabulary, Speaker-independent Chinese Keyword Spotting Algorithm
10	Fuzzy GMM-based confidence measure towards keyword spotting application