Font Size: a A A

Research On WFST Based Spoken Term Detection

Posted on:2015-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:L H LuFull Text:PDF
GTID:2308330482479164Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Spoken Term Detection is to search in a large amount of speech resource and return relevant information according to the users’ query, playing an important role in the field of mititary and information security, classifcation and management of speech resource, speech search engine and so on. Recently, Speech Term Detection based on Weighted Finite-state Transducer is becoming an increasing popular technology and receiving more and more attention. In the WFST framework, this thesis mainly focuses on the improvement of Lattice structure, query expansion for Out-of-vocabulary query reteieval and set of the descision threshold, aiming to find effective ways to accelerate precision as well as retrieval speed. The main content includes:(1) A Spoken Term Detection system based on WFST is constructed in the theory framework of WFST. In the indexing stage, Lattices are transformed to automatons directly. Timed factor transducers are constructed with theses automatons after preprocessing. Fanally, the index is achieved by taking the union of the timed factor transducers and optimizing the union. In the searching stage, the queries are transformed to automatons and then composed with the index. After optimization, the automaton representing the searching results is obtained. Experimental results show that compared with the tranditional method, the WFST system has an obviously faster searching speed.(2) An indexing method based on confusion network instead of Lattice is presented in the WFST framework to solve the problem of the redundant information and complex structure of Lattice. In the indexing stage, confusion networks are firstly extracted from Lattices and then transformed to automatons. Then, the index is constructed with the general weighted automata indexation algorithm. Composition algorithm is used for searching in the retreieval stage. Experimental results show that compared with the WFST index based on Lattice, the WFST index based on confusion network has a smaller index size and a faster searching speed when ensuring the retrieval accuracy.(3) A query expansion method for OOV based on phonetic confusion model is presented in the WFST framework, aiming to overcome the problem of OOV queries by expanding the queries to multiple pronunciation sequences. Phonetic confusion model is represented by P2 P transducer which is obtained from the phonetic confusion matrix in the WFST framework, reflecting recognition errors and the confusion between two phonemes. Firstly, a pronunciation sequence of the query is generated by Grapheme-to-Phoneme model; then, the pronunciation sequence is expanded to N-best sequences by phonetic confusion model to compensate for potential differences caused by recognition errors between index and query representations and reduce the missing alarm rate effectively. The experimental results show that the OOV retrieval performance of the system is improved significantly by the expansion based on phonetic confusion model.(4) A term specific thresholding method based on relevance score distribution is presented in the decision stage of the STD syetem, solving the problem of the poor performance of global threshold in the traditional systems. At the decision stage, different thresholds are set for different queries according to the relevance score distribution of candidates. Firstly, the distribution of all candidate scores retrieved for a query term is modeled by exponential mixture model. Then, the parameters are estimated in an unsupervised manner using EM algorithm. Since EM algorithm is sensitive to initial values, K-means clustering instead of randomization is used in the initialization. Finally, the threshold is calculated by Bayes minimum risk rule. The experimental results show that the performance of the thresholding method proposed in this paper is better than others in the PR curve.
Keywords/Search Tags:Spoken Term Detection, Weighted Finite-state Transducer, Lattice, Confusion Network, Out-of-Vocabulary, Phonetic Confusion Model, Relevance Score Distribution, Term Specific Thresholding, K-means Clustering
PDF Full Text Request
Related items