Font Size: a A A

The Reserch Of Document Image Detection And Information Extraction System Based OCR

Posted on:2016-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZouFull Text:PDF
GTID:2308330479990065Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The information processing methods have been changing with each passing day by development of science and technology. All walks of life are kept up with the trend of the “information electronically”. More and more companies use the document image recognition based on the Optical Character Recognition(OCR). Compared with the traditional way of manual entry, OCR smart entry has the advantages which the speed of OCR faster than the speed of manual entry. It saves a large amount of human resources and optimizes the allocation of resources and makes people assignment in more meaningful work. The techniques of OCR can provide users with high efficiency and low cost data acquisition scheme, so as to provide powerful support for business rapid development. As a result, a large number of image automatic identification systems and APP arise at the historic moment.For example, the id card recognition, automatic identification of instrument, license plate automatic recognition. Those products which fix an identification object identify with specific recognition program. Obviously, when users want identify many different document images, the single object model will be very troublesome. So, it is necessary to develop a general method which can according to the program automatically recognize document types and extracting information.Based on the above requirements, this paper proposes a variety of document image recognition system based on OCR. Document image include paper homepage image, document image, business CARDS and so on.The system canautomatically discriminate input images, and extract the image information.This article describes the proposed system mainlyfrom the image preprocessing, document image detection, layout analysis and information extraction.Salt and pepper de-noise in the image preprocessing is one of the important innovation point of this system, by comparing the journal papers published in recent years, this paper puts forward the salt and pepper de-noise method in high noise and low noise both have a good effect.Secondly, in terms of image skew correction,the projection of the improved algorithm make surethe accuracy of angle, speed up the speed of looking for angle.In terms of document image testing,ituses Ada Boost algorithm fordetecting the document. Image layout analysis technique is based on the improved clustering algorithm to analyze. Information extractby usinga priori rule base, and through the bayesian probability to obtain.
Keywords/Search Tags:Document image, Image preprocessing, Image classification, Layout analysis, Information extraction
PDF Full Text Request
Related items