Font Size: a A A

Study And Implementation Of Content-based Mandarin Spoken Term Detection System

Posted on:2012-07-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:W LiFull Text:PDF
GTID:1118330362467921Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Computer Science and Internet technology, lots ofaudio information has been recorded and saved. With the accumulation of these audios,finding user interested pieces from the huge amount of voices has become one of the hotissues being studied.On the basis of speech recognition technologies, a series of voice informationretrieval schemes has been developed. One type (spoken term detection) is using largevocabulary continuous speech recognition technologies to translate speech voices intosymbols, which are then indexed for fast search. This scheme has being widely studiedfor its wide applicability and high efficiency. The performance and speed of both speechrecognition and voice retrieval are main factors that restrict its development. These keyissues should be further improved to accelerate the utility of spoken term detection. Inaddition, compared with English, there is less work focused on Mandarin for itsparticularities. The key issue needing urgent study is how to improve the performanceof mandarin spoken term detection system.This dissertation studies contented-based Mandarin spoken term detection. Itkeeps its focus on speech recognition and retrieval. On the basis of other researchproductions, it is eager to develop a fast speech recognition system without loss ofrecognition's performance, which is considered as a good front-end for the furtherspoken term detection. Then it improved the pre-existing retrieval methods,keeping its focus on the word-based spoken term detection system. It also discussesthe syllable based system as well as different system's confusion. Its aim is toestablish a spoken term detection system with high performance yet low timecomplexity.To attain the above aim, we use the Weighted Finite-state Transducer to buildnetworks for speech recognition. Methods such as "Synchronous Prune CompositionAlgorithm","Transition Number based ε-Removal Algorithm" and"Dictionary-Shifting based Network Building" are developed to optimize therecognition network. And the "Trellis based Lattice Generating" Algorithm ispresented for fast speech recognition. These improvements are integrated togetherto actualize a speech recognition system WDecode, which is6.7~9.5times faster than HDecode and3.6~4.7times faster than Juicer.On the basis of WDecode, we pay our attention on spoken term detection. The"Term expansion retrieval method" is developed to improve the word-based spokenterm detection system. With this method, the system's EER has been improved by41.85%and41.00%relatively compared with the word-based baseline system. Inaddition, we present a method called "Term Group Retrieval" to improve the system'sspeed. Acceleration of relatively43.52%~72.03%has been achieved with thismethod.On the basis of the above study and improvement, we have finallyestablished a content-based mandarin spoken term detection system, which isproved to do the speech information retrieval fast with high performance.
Keywords/Search Tags:spoken term detection, weighted finite-state transducer, lattice, term expansion, term group retrieval
PDF Full Text Request
Related items