Font Size: a A A

Research On Tibetan OCR Recognition Algorithm

Posted on:2018-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:W P LiFull Text:PDF
GTID:2428330596954776Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Character recognition is a kind of technology which is combined by pattern recognition,image processing and word progressing,and it is an important branch in pattern recognition and artificial intelligence.After years of exploration and practice,the recognition of the Western characters and Chinese characters has been practical.However,the recognition of printed Tibetan has been a challenging subject in character recognition because of its relatively complex structure and relatively high proportion of similar words.In this thesis,the Extreme Learning Machine algorithm is applied to the Tibetan OCR process based on the existing technology of recognition of printed Tibetan.In the feature extraction stage,three existing features are compared: mapping feature,grid feature and pixel feature.At the same time,two different fusion features are proposed which are based on mapping feature and grid feature.In the recognition stage,the Extreme Learning Machine algorithm is selected which is compared with the traditional support vector machine and BP neural network algorithm.The main work is as follow:(1)Pretreatment of printed Tibetan.The binarization,wording and normalization of Tibetan character image preprocessing are studied.After preprocessing,the same size of the Tibetan word is obtained.(2)Feature extraction of printed Tibetan.The quality of the feature extraction directly affects the accuracy of the recognition of printed Tibetan.In the feature extraction stage,three existing features are compared: mapping feature,grid feature and pixel feature.(3)Selection of classification algorithm.The optimal recognition rate is 99.57% when the Extreme Learning Machine(ELM)is applied to the printed Tibetan's OCR process.Compared with the traditional support vector machine(97.60%),BP Neural network algorithm(93.29%)(4)Recognition of new test sets.In order to detect the generalization ability of the recognition algorithm,a new test set is introduced and two fusion features are proposed.The experimental results show that the ELM shows the best accuracy(83.6%)while recognition with the second fusion features which is 4.51% higher than that of the three features used in the third chapter.And the improvement we propose which use QR decomposition instead of SVD decomposition has made the ELM higher accuracy.
Keywords/Search Tags:recognition of Printed Tibetan, Extreme Learning Machine, Feature extraction, Support Vector Machine, Back Propagation Neural Network
PDF Full Text Request
Related items