Font Size: a A A

Research On Off-Line Arabic Text Recognition

Posted on:2012-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:Ammar Mohammed Ali Al-TameemiFull Text:PDF
GTID:2248330377958439Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The machine simulation of human reading has been the subject of intensiveresearch for almost four decades. Few researches have been conducted on the automaticArabic text recognition for its complexity. Optical Arabic text recognition is receivingrenewed extensive research after the success in optical text recognition in many languagessuch as Latin, Chinese and Japanese. So far, still a good amount of work is needed to bedeveloped in Arabic text recognition, although a reasonable amount of work has beenreported.In this study, a set of feature extraction methods is used to get structural andgeometrical representations of Arabic words. The system is focused on employing SupportVector Machines (SVMs) as a pattern recognition tools. We assumed each shape of anArabic word as a separate class by ignoring segmenting a word into characters. Theproposed system is mainly composed of three phases. The first phase is preprocessing,which performs image binarization, line segmentation and word segmentation. Thesegmented words are then fetched into the second phase-features extraction. The extractedfeatures consist of twenty sliding windows of vertical slides summation, four local maximapoints in the vertical projection with the center of gravity, a number of connectedcomponents, positions of corners detected with end points in the word image, and the meanvalue of the word image. The last phase was classification where the multi class SVMswith one against-all technique is used.The proposed recognition system was evaluated on both printed and handwrittenArabic words. In printed, five different Arabic fonts, Andalus, Arial, Simplified Arabic,Tahoma and Traditional Arabic, were used. There are four categories experiments. The firsttype of experiments aims at testing all fonts together using three datasets with42features.the recognition rates of97.344%,88.727%and88.582%were obtained. On contrast,the second type of experiments aims at testing each feature on all fonts. The recognitionrates from these experiments are95.287%,90.446%and80.690%, respectively. The thirdtype of experiments is used to test multi-features on each font through15different datasets. The recognition rates for all datasets are over97%with the best one is98.743%. The fourthtype of experiments is used to test the system on hand written Arabic words dataset, whichproduced recognition rate of94.884%.
Keywords/Search Tags:Sliding window, curvature, local maxima, connected component, supportvector machine
PDF Full Text Request
Related items