Research On Multi-Script Identification In Natural Images

Posted on:2015-06-14

Degree:Master

Type:Thesis

Country:China

Candidate:M J Piao

Full Text:PDF

GTID:2298330431979186

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Multi-script identification in natural images is a very important research issue in the field of contents-based image retrieval and development of multi-language OCR system. With the development of information industry, the amount of digital images has rapidly increased. It is of important significance and widely applicable value to retrieve objects from masses of stored images. However, to accurately and quickly retrieve images from large-scale database is still to be solved. Up to now, most OCR systems are trained by a single language, therefore, for unknown languages or multi-script, the existing OCR systems will lose effectiveness. In natural images, the characters are different in amounts, fonts, size, coverage area and text space etc. Therefore, the existing multi-script identification methods for text images lack of flexibility. To solve the problem, a multi-script identification method in natural images based on text edge density, text arrangement rules and PCA method was proposed in this dissertation.First of all, a text detection algorithm was presented, which combined the characteristics of text edge density and text arrangement. In algorithm, Sobel gradient operator was employed to detect image edge and then the image edge density was obtained. After the preprocessing of image edge by morphological method, text areas was detected by means of prior hypotheses for text arrangement.Then, a multi-script identification method based on PCA was put forward. The first step of the method was to make character sample set of Korean, Chinese and English. Furthermore, corresponding Eigen space was built by PCA method. At last, the script language was identified by measuring similarity between the original character and reconstructed character according to Euclidean distance and KL distance.Finally, algorithm of multi-script identification in natural images was designed by combining the above text detection method and multi-script identification method.The success rate of the proposed method based on text edge density, text arrangement and PCA method, as observed experimentally, are88.36%and87.37%for text detection and multi-script identification respectively. It is very effective to identify the language type of detected text region which includes Korean, Chinese and English, in natural images, and the performance proves that the presented method in this dissertation is effective and feasible.

Keywords/Search Tags:

text detection, multi-script identification, text arrangement, PCA, Euclidean distance, KL distance

PDF Full Text Request

Related items

1	Research On Text Detection And Multi-script Identification In Natural Images Based On Machine Learning
2	The Research On Text Identification And Detection Algorithm Of Natural Scene Images
3	Research On Script Identification In Text Images Based On Deep Learning
4	A Study On Packed Detection And Exuviate Based On Weighted Euclidean Distance
5	Research And Implementation Of Sensitive Text Classification Algorithm Based On Artificial Immune System
6	Classification From Local And Global Perspective For Scene Text Script Identification
7	An Intrusion Detection Method Based On Euclidean Distance
8	Research On Short Text Classification Based On Topic Model
9	Research On Privacy-preserving Distance Calculation Protocol And Its Application
10	Research On Text Similarity Algorithm Based On WMD Distance