The Reserch Of Document Image Detection And Information Extraction System Based OCR

Posted on:2016-08-21

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Zou

Full Text:PDF

GTID:2308330479990065

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The information processing methods have been changing with each passing day by development of science and technology. All walks of life are kept up with the trend of the â€œinformation electronicallyâ€. More and more companies use the document image recognition based on the Optical Character Recognition(OCR). Compared with the traditional way of manual entry, OCR smart entry has the advantages which the speed of OCR faster than the speed of manual entry. It saves a large amount of human resources and optimizes the allocation of resources and makes people assignment in more meaningful work. The techniques of OCR can provide users with high efficiency and low cost data acquisition scheme, so as to provide powerful support for business rapid development. As a result, a large number of image automatic identification systems and APP arise at the historic moment.For example, the id card recognition, automatic identification of instrument, license plate automatic recognition. Those products which fix an identification object identify with specific recognition program. Obviously, when users want identify many different document images, the single object model will be very troublesome. So, it is necessary to develop a general method which can according to the program automatically recognize document types and extracting information.Based on the above requirements, this paper proposes a variety of document image recognition system based on OCR. Document image include paper homepage image, document image, business CARDS and so on.The system canautomatically discriminate input images, and extract the image information.This article describes the proposed system mainlyfrom the image preprocessing, document image detection, layout analysis and information extraction.Salt and pepper de-noise in the image preprocessing is one of the important innovation point of this system, by comparing the journal papers published in recent years, this paper puts forward the salt and pepper de-noise method in high noise and low noise both have a good effect.Secondly, in terms of image skew correction,the projection of the improved algorithm make surethe accuracy of angle, speed up the speed of looking for angle.In terms of document image testing,ituses Ada Boost algorithm fordetecting the document. Image layout analysis technique is based on the improved clustering algorithm to analyze. Information extractby usinga priori rule base, and through the bayesian probability to obtain.

Keywords/Search Tags:

Document image, Image preprocessing, Image classification, Layout analysis, Information extraction

PDF Full Text Request

Related items

1	Research On Layout Analysis And Text Line Extraction Of Document Image
2	Research On Document Image Layout Analysis And Text Extraction
3	The Key Technology Research Of Document Image Layout Analysis
4	Study On Document Image Layout Analysis Technology
5	Research And Realization On Document Image Retrieval Of Non-plain Text Oriented
6	A Study Based On Layout Analysis Of Document Image Retrieval Algorithm
7	Extraction And Retrieve Of The Feature Of Document Image
8	General Document Identification System Based On OCR Technology
9	Visual And Textual Based Document Image Layout Analysis Methods
10	Camera-Based Document Image Layout Analysis System On Android