Font Size: a A A

Design And Implementation Of Voice Dialer For Embedded Systems

Posted on:2006-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:D F ChenFull Text:PDF
GTID:2178360182483501Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of wireless communications and Personal DigitAssistant (PDA), the application of speech recognition technologies to the embeddedsystems and devices has become a hot spot in research. However, due to memory andspeed limitations of PDA, it fails to deliver optimum performance and therefore needsfurther tailor-made improvement. To find a solution, we have conducted researcheson speaker-independent speech command recognition system and voice dialingsystem, based on Pocket PC, an embedded system platform. With the aim toovercoming the constraints of memory space and computation speed under acceptablerecognition performance degradation, we took the following steps to reduce thetemporal and spatial complexity of voice dialing system:(1) By studying how to choose appropriate acoustic recognition unit and how todo acoustic modeling with less dimensioned acoustic features under acceptableperformance degradation, and by carrying out sufficient experimental comparisons,we decided to use Extended Initial/Final (XIF) as acoustic recognition unit.Additionally, through experiments, we found a more compact acoustic feature foracoustic modeling and developed a new acoustic modeling method based on acousticrecognition unit concatenation and full word recognition, which has proved suitablein embedded voice dialing system.(2) By studying the dynamic interdependency between the rank of correct tokenpath during Viterbi decoding and the input speech frames, i.e., input feature trainframes, and the interrelationship between the difference scores of token paths and theinput speech frames in Viterbi beam search, we proposed a new search strategynamed Dynamically-Adjustable Histogram Pruning with the integration of thedifference scores of token paths, which could significantly reduce the redundantcomputation and accelerate the search decoding.(3) After studying the repetition of computing likelihood scores, we applied thelikelihood score lookup table to further accelerate the Viterbi decoding.(4) After studying the problems of zero drift and the beginning pure zero wavewhen using traditional end point detection technologies in the practical embeddedapplications, we made improvements accordingly to enhance the accuracy of VoiceActive Detection (VAD).Through sufficient experiments, we synthetically applied the methods proposedabove in designing and implementing a practical speaker-independent, user definablevoice dialing speech recognition system. The practical testing experiments in PDAdevice demonstrate: in a randomly chosen 200-Chinese-word vocabulary, itsrecognition accuracy rate reaches 98.70%. Furthermore, it could realize betterrecognition speed by 80 times and save decoding space by 30% in comparison to thebaseline system using standard Viterbi decoding method.
Keywords/Search Tags:Speech Recognition, Voice Dialing, Personal Digital Assistant (PDA), Dynamically-Adjustable Histogram Pruning
PDF Full Text Request
Related items