Using Multiple Sequence Alignment And Statistical Language Model To Integrate Multiple Chinese Address Recognition Outputs

Posted on:2016-11-01

Degree:Master

Type:Thesis

Country:China

Candidate:S C Chen

Full Text:PDF

GTID:2308330461975780

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

During the last few decades, automatic mail sorting systems have been widely deployed around China, in which the technology has made progress from postcode recognition to address recognition. However, Recognizing Chinese address is still error-prone, especially when the images are low-quality and contain lots of noise. The errors can be categorized to three categories:1) Character segmentation is right, while character recognition result is wrong; 2) Wrong character segmentations result in false recognition; 3) Confusion of Chinese characters and numeral characters result in wrong output.Different recognizers may result in different mistakes when they are used to recognize a Chinese address. In this paper, we present a method of combining multiple Chinese address recognition outputs to improve Chinese address recognition accuracy. The proposed method consists of three steps:1) Align the recognizer outputs based on minimum edit distance and retrieve optimal pairwise alignment, then extend the pairwise alignment to multiple alignment.2) Build the lattice of candidate hypotheses based on the optimal multiple alignment.3) Use the Viterbi algorithm to select the maximum likelihood candidate sequence based on statistical language model.We evaluate our method on two real data sets which are Chinese printed envelope images captured from sorting machine. The first data set (SRI 1) consists of 1651 images (composed of 30,014 characters) which are segmented address images from envelop images. For the SRI1 data set, we test the performance of character recognition. The second data set (SRI2) consists of 3071 images which are Chinese mail images and the address area are not pre-segmented. For the SRI2 data set, we test the performance of the whole address on automatic mail sorting system. The results of the experiments show that the performance of our method is superior, compared to the single recognizers and Miyaoâ€™s method.

Keywords/Search Tags:

multiple sequence alignment, multiple Chinese address recognition, statistical language model, minimum edit distance

PDF Full Text Request

Related items

1	Research On Protein Multiple Sequence Alignment Algorithms And Assessment Of Their Performance
2	Research Of Improvement And Parallelization For Sequence Assembly And Multiple Sequence Alignment
3	Research On Multiple Sequence Alignment Algorithms In Bioinformatics
4	Parallel Optimization For Multiple Sequence Alignment Based On CPU-GPU Heterogeneous System
5	Research On Multiple Sequence Alignment Algorithm Of Bioinformatics
6	Statistical Models And Algorithms For Aligning Multiple Sequences
7	Biological Sequence The Algorithm Kalign's Research Analysis
8	Research Of Multiple Sequence Alignment Based On The Lempel-Ziv
9	Research Of Parallel Method For GPU-based Multiple Sequence Relevance Analysis
10	The Structuring&Matching Of Address In Chinese Address Recogniton System