Font Size: a A A

A Study Of Key Problems In Spoken Term Detection

Posted on:2014-06-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:B X LiFull Text:PDF
GTID:1268330401463100Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of modern media and internet, a large number of speech data becomes the important carrier of information. Spoken term detection depends on the theories in the fields of speech recognition, information retrieval and natural language processing, etc. It aims to obtain useful knowledge from complex voice data by detecting individual occurrences of specified search terms. This dissertation focuses on several key problems in spoken term detection, such as post-processing, hierarchical indexing, keyword matching and confidence measure, and its main contributions and innovations are described as follows:1. Weighted syllable confusion matrix generation algorithmConfusion matrix has important applications in query expansion and distance metric. Generally, a confusion matrix is generated from the alignment between1-best hypotheses and the reference. Each syllable in the1-best hyptotheses is not necessarily the optimal, and the recognition results of noise data are always wrong. So, the confusion matrix generated from the traditional methods is inaccurate. We generate the weighted syllable confusion matrix from the confusion network, time information is adopted to align the confusion network and reference, and only the slices including the right syllable are considered. The confusion weights between the syllables are calculated according to the time overlap and normalized acoustic score. The experiments show that the algorithm can provide high performance not only in high recognition error rate but also few training corpus.2. Confidence feature extraction algorithm based on word activation force modelConfidence measure is very important in speech recognition post-processing and the ranking of the retrieved results. Currently, most of the confidence features derived from decoding information, how to extract effective confidence features from high-level information sources becomes very important. The word appeared in a sentence is closely related with its neighbors, because they interact with each other at the point of syntactical and semantic information. The word activation force model establishes these relations according to the statistics of word occurrence and co-occurrence. We proposed a confidence feature extraction algorithm based on word activation force model, which can determine the match between word and its context in semantic space. The experiments show that the proposed confidence feature increases the number of information sources of confidence features with a good information complementary effect and can effectively improve the performance of confidence evaluation combined with confidence features from decoding information.3. Keyword matching algorithm based on the acoustic distanceSpeech recognition errors are inevitable in the spoken term detection system, and the queries are always out-of-vocabulary words. The exact match method is no longer applicable. Edit distance was used to address these problems through approximate matching. But, approximate matching was implemented by using a very simple error cost model based on a small set of heuristic rules. In order to take the degree of acoustic confusability between syllables into account for string matching, acoustic distance is proposed, it assigns smaller costs for particularly confusable pairs of syllables. The costs of acoustic distance derived from syllable confusion probabilities which can be acquired from weighted syllable confusion matrix. The experiment shows that the acoustic distance provides for more robust approximate string matching than the edit distance.4. Fast syllable sequence search algorithm based on hierarchical indexingAcoustic distance matching technique improves the accuracy of spoken term detection by allowing for syllable substitution, insertion and deletion errors; however, this comes at the cost of reduced search time. The single major cause of computation required at search time is the calculation of acoustic distance between the target syllable sequence and every one of indexed syllable sequences in the index database. Hierarchical indexing method is proposed to effectively predict the subset of sequences in the index database that will have the best acoustic distance, and avoid actually having to do the calculation for all other sequences. The use of hierarchical indexing restrict the search space to a set of syllable sequences likely to have been generated by the search term, but the computations of acoustic distance are also needed. The acoustic distance between the syllable sequences in the index database are precomputed and stored in the distance index. We can quickly obtain sequence similarity by searching for the distance index. The experiment results demonstrate that hierarchical indexing and acoustic distance index database need more the storage cost, but increase the search speed greatly with no loss in spoken term detection accuracy.
Keywords/Search Tags:speech recognition, spoken term detection, confidencemeasure, word activation force, confusion matrix
PDF Full Text Request
Related items