Font Size: a A A

Research And Implementation Of Machine-Printed Mongolian Recognition System Based On Hadoop Paltform

Posted on:2017-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z P YaoFull Text:PDF
GTID:2348330485961601Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Machine-printed Mongolian character recognition is an important part of the Mongolian character recognition. With the development of the Mongolian cultures, a large number of documents (such as books, literature and so on) have been created. How to convert these documents into texts is a challenging task. The simplest way is by means of inputting manually, which not only waste a great deal of resources but also accompany a number of errors. Therefore, optical character recognition (OCR) is an alternative, which can automatically convert images into texts. But, the original approach of the machine-printed Mongolian character recognition cannot meet the actual requirements of application. Hence, this thesis mainly concentrates on handling the parallelization of the machine-printed Mongolian character recognition.Hadoop is derived from the Map-Reduce module proposed by Google. Under the framework of the Map-Reduce module, an application program can be decomposed into a number of instructions for parallel computing. The Map-Reduce module consists of two parts:Map section and Reduce section. In the Map section, input data would be segmented into a number of small data blocks for parallel computing. In the Reduce section, the results of all data blocks can be gathered, and then generates the output.In this thesis, we proposed an approach to realize the parallelization of the machine-printed Mongolian character recognition by Hadoop (i.e. the Map-Reduce module). In detailed, a task of recognition is decomposed and assigned to each node in the cluster. And then, the corresponding results from each node are gathered into the final recognition result. Therein, a convolutional neural network is adopted to extract features.The whole system of the machine-printed Mongolian character recognition has been realized in this thesis. The system includes two parts. The first part is an interface to upload Mongolian images for users. The second part is an application program to attain the aim of the machine-printed Mongolian character recognition. The architecture of the proposed system is suited for maintaining and expanding.In order to test the performance the proposed approach, a dataset consists of 30,000 Mongolian word images has been collected. In the dataset, the average execution time for each word is 0.161s. And the accuracy is 92.03%. Experimental results show that the proposed approach can significantly improve the efficiency.
Keywords/Search Tags:Mongolian recognition, hadoop, mapreduce, convolutional neural network
PDF Full Text Request
Related items