Font Size: a A A

Research On Keyword Spotting Based On DMLS

Posted on:2015-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZhengFull Text:PDF
GTID:2308330482479165Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Keyword spotting refers to finding of all occurrences of the given words in speech data, which is one of the efficient ways to deal with oral speech and realize human-machine intelligent communication. It has broad application prospects. Nowadays, keyword spotting based on dynamic match lattice spotting(DMLS) is one of the mainstream methods. The DMLS method combines the fast performance of lattice spotting with dynamic sequence matching techniques. In lattice searching stage, the minimum edit distance is used to compensate insertion, deletion and substitution errors in phoneme recognizer. It achieves a rapid and accurate keyword spotting. In this paper, research is mainly on the lattice generation, index construction, confidence measure and out-of-vocabulary word detection according to the characteristics the DMLS method. The main work and innovations are as follows:(1) The accuracy of phoneme lattice has a direct impact on the performance of keyword spotting. In order to improve the accuracy of lattice, TRAP features and multilayer perceptron are used to construct a more accurate phoneme lattice generation system. On this basis, we build a baseline system based on DMLS. This system performs a modified viterbi traversal to compile a fixed-length phoneme sequence database(SDB). In the searching stage, a minimum edit distance is used as the confidence score to implement the keyword spotting. Tests show that the proposed method is superior to the systems with MFCC and PLP features and the recall rate increases by about 5%.(2) For the problems that some of the information is lost in SDB construction satge and query term is longer than the length of the index, a method of hybrid index, combining the most probable phoneme sequence and SDB, is proposed. The most probable phoneme sequence is the 1-best full result in speech recognition, which can represent the global optimum result in the entire lattice and form a complementary with SDB. Moreover, the most probable phoneme sequence is not affected by the sequence length N and it can be used to detect query term with a longer sequence of phonemes. Tests show that the hybrid index method compared to single SDB index improves the figure of merit by 1.4% relatively.(3) In the keyword spotting system based on DMLS, the minimum edit distance is used as the confidence measure to implement keyword spotting. While this measure increases detection rate, it also raises the false alarm rate. To address this problem, this paper proposes an approach that integrates the posterior probability confidence measure into DMLS. Firstly, the posterior probability of the lattice is introduced into the index stage of DMLS. Secondly, data-driven phoneme substitution, insertion and deletion costs are incorporated for more flexible phoneme sequence matching. Finally, we blend the minimum edit distance and the posterior probability confidence together to detect all occurrences of the keywords. The experimental results show that there is a certain complementarity between the minimum edit distance and posterior probability confidence measure and the equal error rate achieves 13.3% relative reduction.(4) To address the issue of out-of-vocabulary (OOV) word in keyword spotting, a method incorporating query expansion into dynamic match is proposed. Query expansion and dynamic match are two different ways to compensate the high degree of uncertainty in OOV pronunciation. Considering the potential mutual complementarity between them, this paper presents two methods of fusion. One is result fusion that performs a parallel OOV word detection with query expansion and dynamic match individually and then merges search results of the two systems. Another is confidence fusion which combines the minimum edit distance and the pronunciation score together as a hybrid confidence measure to implement OOV word detection and verification. Tests show that the second fusion method is more efficient and the figure of merit achieves 19.8% relative promotion.
Keywords/Search Tags:Keyword Spotting, Dynamic Match Lattice Spotting, TRAP Features, Minimum Edit Distance, Most Probable Phoneme Sequence, Posterior Probability Confidence Measure, Query Expansion, Out-of-Vocabulary Word
PDF Full Text Request
Related items