A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques

Posted on:2012-10-21

Degree:M.S.C.S

Type:Thesis

University:The University of Texas at Dallas

Candidate:Singh, Amriteshwar

Full Text:PDF

GTID:2458390011452550

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

Multimodal system design has gained popularity over the recent years due to the possibility of combining different input modalities for improved system performance. This thesis presents a multimodal approach for automatic post sorting that combines Automatic Speech Recognition (ASR) and Optical Character Recognition (OCR). For the purpose of fusion, we propose a new confidence measure for OCR output. Based on the OCR confidence measure, a dynamic fusion strategy is developed that forms its final decision on (i) OCR output alone, (ii) ASR output alone, and (iii) combination of ASR and OCR outputs. In particular, the combination of ASR and OCR outputs is performed in two steps. First, the ASR output is used to automatically generate a list of alternate address candidates. Next, this list of alternate addresses is processed by the OCR to determine the most likely address. The proposed system is evaluated on speech data derived from the UT-Accent corpus, and images obtained from the Siemens Mobility-Postal Automation database. Both speech and image data were collected in realistic settings, and contains large real-world variability. Our experiments show that the proposed multimodal solutions achieves an overall zip code recognition rate of 88.9%, which is a substantial improvement over ASR alone (79%) and OCR alone (80%). This advancement represents an important contribution that leverages technologies to improve the overall address recognition in package address recognition.

Keywords/Search Tags:

OCR, Recognition, ASR, System, Multimodal, Automatic, Speech, Fusion

PDF Full Text Request

Related items

1	Research And Application On Speech Recognition For Complex Scenes
2	The Research And Implementation Of Multimodal Interactive System For Space Robot
3	Research On Speech Emotion Recognition Based On Multimodal Information Fusion
4	Multimodal fusion with applications to audio-visual speech recognition
5	The Study Of Multimodal Emotion Recognition Based On Text,Speech And Video
6	Research On On/Off-Screen Speech Separation Algorithm Based On Multimodal Fusion
7	Audio-Visual Multi-Modal Fusion Approach Research And Application
8	Research On Multi-modal Emotion Recognition Algorithm Based On Speech And Face Expression
9	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
10	Driver Road Rage Recognition By Combining Facial Expression And Speech