Font Size: a A A

A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques

Posted on:2012-10-21Degree:M.S.C.SType:Thesis
University:The University of Texas at DallasCandidate:Singh, AmriteshwarFull Text:PDF
GTID:2458390011452550Subject:Computer Science
Abstract/Summary:
Multimodal system design has gained popularity over the recent years due to the possibility of combining different input modalities for improved system performance. This thesis presents a multimodal approach for automatic post sorting that combines Automatic Speech Recognition (ASR) and Optical Character Recognition (OCR). For the purpose of fusion, we propose a new confidence measure for OCR output. Based on the OCR confidence measure, a dynamic fusion strategy is developed that forms its final decision on (i) OCR output alone, (ii) ASR output alone, and (iii) combination of ASR and OCR outputs. In particular, the combination of ASR and OCR outputs is performed in two steps. First, the ASR output is used to automatically generate a list of alternate address candidates. Next, this list of alternate addresses is processed by the OCR to determine the most likely address. The proposed system is evaluated on speech data derived from the UT-Accent corpus, and images obtained from the Siemens Mobility-Postal Automation database. Both speech and image data were collected in realistic settings, and contains large real-world variability. Our experiments show that the proposed multimodal solutions achieves an overall zip code recognition rate of 88.9%, which is a substantial improvement over ASR alone (79%) and OCR alone (80%). This advancement represents an important contribution that leverages technologies to improve the overall address recognition in package address recognition.
Keywords/Search Tags:OCR, Recognition, ASR, System, Multimodal, Automatic, Speech, Fusion
Related items