Font Size: a A A

Research On BN Feature Based Acoustic Modeling And Its Application In Keyword Retrieval

Posted on:2016-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:D Y LiuFull Text:PDF
GTID:2308330467494914Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speech retrieval based on large vocabulary continuous speech recognition is an important research direction in multimedia retrieval. The research of this thesis focuses on speech recognition and keyword search of speech retrieval, mainly includes the fol-lowing three aspects:firstly, build an bottleneck (BN) feature based acoustic model for speech recognition; secondly, propose an optimization method of BN feature extractor to further improve recognition accuracy. Thirdly, establish keywords retrieval system and propose optimization methods to improve the retrieval performance.BN feature based acoustic modeling combined the methods and advantages of Gauss mixture model and deep neural network based recognition system. BN features are extacted by a neural network with a narrow hidden layer, and then used to train Gauss mixture model based recognition system, and further enhance the recognition perfor-mance by discriminative training techniques. For the problems in low resource lan-guages, this paper propose some optimization methods to improve the acoustic model, such as using tone features and tone modeling, noise processing, cross lingual e.t.c.To further improve BN feature based recognition system’s performance, this pa-per propose an method to optimize BN feature extractor. Two different discriminative objective functions, the maximum mutual information criterion (sMMI) and minimum phone error rate criterion (MPE), are adopt to optimize network and model parameters. Two training methods are used to update the network, only update the last layer and up-date the whole parameters of network. The optimization methods using all the training data to calculate gradient,it’s suitable for parallel computing.Considering speech recognition errors, keywords need to be searched on candidate word lattice. This paper adopt the retrieval methods of confusion network and weighted finite state transducer(WFST) to obtain the keyword’s position and score in word lattice. Secondly, the methods of threshold judgment,term dependent and sum to one are used for confidence judgment. Finally, this paper propose the optimization retrieval meth-ods of building keywords related language model, system combination and combining confusion network with WFST, which improved the retrieval performance.Aiming at NIST STD2006, OpenKWS2013and OpenKWS2014retrieval compe-titions, this paper built speech recognition and keyword retrieval systems on English, Vietnamese and Tamil respectively. The experimental results verify the validity of the proposed recognition and retrieval methods in this paper.
Keywords/Search Tags:Speech Retrieval, Speech Recognition, Deep Neural Network
PDF Full Text Request
Related items