A Generic, Scalable Architecture for a Large Acoustic Model and Large Vocabulary Speech Recognition Accelerator

Posted on:2014-10-02

Degree:Ph.D

Type:Dissertation

University:North Carolina State University

Candidate:Bapat, Ojas Ashok

Full Text:PDF

GTID:1458390008461095

Subject:Engineering

Abstract/Summary:

This dissertation describes a scalable hardware accelerator for Speech Recognition. We propose a generic hardware architecture which can be used with multiple software which use HMM based Speech Recognition . We implement a two pass decoding algorithm with an approximate N-best time synchronous Viterbi Beam Search. The Observation Probability Calculation (Senone Scoring) and first pass of decoding, which uses a simple language model, is implemented in hardware. A word lattice, which is the output from this first pass, is used by the software for the second pass, with a more sophisticated N-gram language model. This allows us to use a very large and generic language model in our hardware. We opt for the logic-on-memory approach to make use of a high bandwidth NOR Flash Memory to improve our random read performance for senone scoring and first pass decoding, both of which are memory intensive operations. For senone scoring, we store all of the acoustic model data in NOR Flash Memory. For the decoding, we partition the data accesses between DRAM, SRAM and NOR Flash, which allows parallelism of these accesses and improves performance. We arrange our data structures in a specific manner, which allows complete sequential access of the DRAM, thereby improving memory access efficiency. We use techniques like block scoring and caching of word an HMM models to reduce the overall power consumption and further improve performance. The use of a word lattice to communicate between hardware and software keeps the communication overhead low, compared to any other partitioning scheme. This architecture provides us with a speed up of 4.3X over a 2.4 GHz Intel Core 2 Duo processor running the CMU Sphinx recognition software, while consuming an estimated 1.72 W of power. The hardware accelerator provides improved speech recognition accuracy by supporting larger acoustic models and word dictionaries while maintaining real time performance.

Keywords/Search Tags:

Speech recognition, Model, Hardware, Large, Generic, Acoustic, Architecture, NOR flash

Related items

1	Application Of Convolutional Neural Network In Large Vocabulary Continuous Speech Recognition
2	Researching Of The Mongolian Acoustic Model Based On Speech Recognition
3	Acoustic Modeling For Continuous Speech Recognition
4	Acoustic Model Of Chinese Speech Recognition Based On DNN
5	A Study On The Extraction Of Speech Depth In Tibetan Language And Its Speech Recognition
6	Acoustic modeling and feature selection for speech recognition
7	Research On Acoustic Modeling For Spontaneous Spoken Speech Recognition
8	Research On Acoustic Model Of Speech Recognition In Educational Scene Based On Deep Learning
9	The Study On Acoustic Model Based Neural Netword In Mongolian Speech Recognition System
10	Acoustic modeling for automatic speech recognition: Deriving discriminative Gaussian networks