Font Size: a A A

Research On System Combination For Spoken Term Detection

Posted on:2016-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:P LiFull Text:PDF
GTID:2308330482979211Subject:Military Intelligence
Abstract/Summary:PDF Full Text Request
With the coming of the era of big data, the rapidly increasing amount of spoken data calls for solutions to index and search the speech data. Spoken term detection(STD) refers to the task of automatically locating the occurrences of a specified query term in a large audio archive. In a typical STD system, a large vocabulary continuous speech regcognizer is used as a frontend subsystem to translate speech data into symbolic sequences, which are then indexed for fast search. Influenced by the performance of the frontend subsystem, the search accuracy and speed are two main factors that limit the real application of the STD system. In order to further improve the performance of the spoken term detection system, this thesis mainly focuses on the studies of combining search results from different ASR systems, STD based on lattice fusion and two-stage score normalization method for STD. The main content includes:(1)A spoken term detection system based on combining search results from different automatic speech recognition(ASR) systems is presented. Firstly, the audio data is transcribed using different ASR systems. Then the ASR output of each system is indexed separately. Each query is searched against the different indices. The scores of the hits are usually normalized. The hit lists returned by different systems are merged to form a single meta-hit list for the query and a score is calculated. In this thesis we explored a system combination based STD method by generating complementary acoustic models. And in order to alleviate the problem of a large vocabulary recognition system missing keywords due to pruning errors, the keywords soft Beam pruning method is applied. Results show that after score normalization, the ATWV is improved in average by 30%, another 10% performance gain is obtained after using the purposed system combination method.(2)A spoken term detection system based on lattice fusion is proposed. The performance of STD system depends on the accuracy of ASR system. Lattice is used as the multiple hypothesis output of speech recognizer. Therefore, the STD system often uses the lattice for indexing. The lattice fusion processing flow is as follows. Firstly, the audio data is transcribed using diverse ASR systems. Each ASR system output is lattice. Then, different lattices are fused in order to effectively use the complementary information of each lattice. Finally, the resulting lattice will be used for retrieval. In this work, we study the lattice fusion method based on Weighted Finite-state Transducer(WFST). The experimental results show that lattice fusion method effectively reduce the word error rate(WER) of ASR system by 5.3%compared to the best single lattice.(3)A two-stage score normalization method is proposed. Many STD systems make “hit/false alarm(FA)” decisions based on the lattice-based posterior probability, which is incomparable across keywords. Therefore, score normalization is essential for a STD system. In this work, Firstly, we investigate the integration of two novel features into a discriminative score normalization method so that the new scores are more discriminative for hit/FA decisions. Secondly, A metric-based normalization method is applied as a post-processing step to further optimize the term-weighted value(TWV) evaluation metric. Experimental results show that two-stage score normalization method can take advantages of both discriminative and metric-based score normalization approaches, and two-stage score normalization method performs 5.8% better than the best single score normalized system.
Keywords/Search Tags:Spoken Term Detection, System Combination, Weighted Finite State Transducer, Lattice Fusion, Confidence Score, Score Normalization, Discriminative Modeling
PDF Full Text Request
Related items