Font Size: a A A

Design And Implementation Of Driving License Identification System Based On Tesseract_OCR

Posted on:2019-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2322330563454329Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the increasingly developed traffic in today's society,rapid and accurate identification of vehicle-related licenses has become a important topic in the field of intelligent transportation.Without increasing existing equipment,the use of machines instead of manual identification can quickly check vehicle information and handle all types of emergencies in a timely manner,which has good management and economic benefits.Relying on the evolving electronic information technology,OCR(optical character recognition)on driving license has become an important step for implementing intelligent transportation.Due to commonly exist problems such as the obvious difference in shooting environments,uneven lighting and tilting phenomena images need to be recognized respectively.During the processing,it is easy to lost text structure information and to result in incorrect character recognition,which can decrase identification rate.The main research content of this thesis is recognizing the driving license.The overall idea of the system design is a combination of extraction and recognition.Focusing on the two aspects,after analyzing the requirements of driving license identification,a driver's license identification system is implemented in combination with related technologies and methods.The main tasks of this thesis are as follows:1.Using OpenCV library to complete the text information extraction.The text information extraction mainly includes several modules: image preprocessing,red seal positioning,text area interception,text area correction,and character segmentation.For red seal positioning module,a high accuracy localization algorithm is designed according to the red seal color and contour characteristics.For the text area correction module,this thesis proposes a complete correction idea by taking consideration of the abnormal situation encountered in the processing of captured images.For character segmentation,this thesis proposes an accurate segmentation algorithm for character overlapping problems.2.Use Tesseract-OCR engine and TensorFlow framework to identify characters.Firstly,optimizing these models based on the analysis of training steps and characterlibrary.Then,training models and optimizing character libraries aiming at character regions and specific printed characters.Finally,identify character images of 16 pixels X32 pixels.3.Design a common system architecture.After considering the requirements,the system is divided into four modules,input and output,information extraction,information identification,and display editing.Although this thesis takes driving licenses as the main research object,the system is very versatile.For example,when using the system to identify driving licenses,only the positional relationship between the red seal and the textual information area needs to be modified,and other modules can be reused.Finally,use the sample image library provided by the project for system testing.The test results show that the idea of document identification presented in this paper is correct and feasible,this not only improved the efficiency of character recognition but also improved the user experience.The goal is achieved,and can be used in further application research based on optical character recognition.
Keywords/Search Tags:Document identification, Information extraction, Character recognition, Tesseract, TensorFlow
PDF Full Text Request
Related items