Font Size: a A A

English Address Image Recognition And Translation

Posted on:2012-08-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:X TuFull Text:PDF
GTID:1118330335465545Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The increasing volume of international mail which come from outboard and are writ-ten in English is promoted by the development of economic globalization. In order to ensure the efficiency and quality of delivery, professionals are needed in post offices to translate the English address into their corresponding Chinese and then mark the transla-tion on envelopes. How to automatically translate English address to Chinese has become a significant issue in postal automation. The maturing character recognition technology and the booming machine translation technology provide a possible resolution of the is-sue.The proposed English-to-Chinese address translation system captures a gray image of an envelope, segments the English address from the gray image by document image segmentation processing and obtains text by OCR technology, and then automatically translates the OCRed text from English to its corresponding Chinese. The system em-ploys technologies in several fields including document image processing, natural lan-guage processing, machine translation, data mining and artificial intelligence.The main achievements in the thesis are as follows:According to characteristics of envelope images, we propose a window localization method based on gray gradient feature and an address localization method based on com-ponent analysis to extract an address area from an envelope image. Moreover, a fast and efficient run-based algorithm is presented to label connected components in images. With the aim of obtaining the bounding boxes of the connected components, the proposed method needs only one pass scan over the image.A maximum matching address understanding method and an address understanding method based on flexible string matching and deterministic finite automata are proposed to extract the information from address text, such as road name, building name, house number, etc. The experimental results of the two methods have showed that the former ensures the high accuracy of obtaining address information while the latter obtains more address information in the case of OCR errors.We present a data mining method based on core andβDP interval reduct of variable rough set model to obtain word sense disambiguation rules, which employs a core-basedβDP interval reduct algorithm to obtain reducts of condition features from the decision table and generates disambiguation rules from reducts. Compared with the data mining methods based on traditional rough set which require high accuracy of classified data and generate rules with less generality, the proposed method is capable of obtaining not only the common rules but also the rules for the special cases and making the quality of rules controllable.A flexible string matching method based on block distance is proposed, which takes into account substring moving besides the basic edit operation set:deletion, addition and substitution. The method measures the similarity between two address texts which are the same in meaning but with different word sequences. The experimental results have proved that the method increase the robustness of the system in case of OCR errors.We have employed the above methods to integrate an English-to-Chinese address translation system, which has been implemented and successfully applied at Shanghai Post Office Mail Center, resulting in both good economic and social effects.
Keywords/Search Tags:English address translation, flexible string matching, variable rough set, address localization and recognition
PDF Full Text Request
Related items