Font Size: a A A

The Effective Information Extraction Algorithm Research Of Machine-printed Invoices

Posted on:2015-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:F L QuFull Text:PDF
GTID:2348330518476732Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The effective information of machine-printed invoices refers to character which can reflect the attributes of the invoices.The main purpose of extracting useful information from machine-printed invoices is to classify and manage the invoices conveniently.The conventional algorithms of character recognition are based on the character features in single space,the result of conventional algorithms is not very accurate.Therefore,the character of invoice can be effectively identified by the combination of multiple spatial characteristics.As an important branch of digital image processing,character recognition has been comprehensively applied in numerous governments and commercial institutions and has become a research orientation in science and technology field.It is well acknowledged that character recognition has been more and more mature in vehicle license plate recognition while it has not achieved full-scale research in other fields,for example,character recognition on printed invoice.In order to solve drawbacks in invoice management,a printed effective character recognition algorithm and recognition system is proposed in this thesis and it is mainly about image preprocessing,extracting features and classification of effective character image extracted from the invoice.Finally,an application program in Android system is produced so as to play an active role in various industries.Research contents of this thesis are as follows:1.The pre-processing of machine-printed invoices image.It is mainly about extracting single effective character image from printed invoice images,then unifying the size.The extract regulation is based on the minimum bounding rectangle of each character,and uniform size is based on bilinear interpolation.2.Building the training and testing samples of character image.Establish separately numeric character image samples and Chinese character image samples.3.Extracting features of useful Chinese characters from the invoice.Firstly,convert the single effective character image on extracted invoice to gray level and binary image.Secondly,conduct Butterworth and median filter processing to form a character image template.Finally,extract Gray Level Co-occurrence Matrix(GLCM),Wavelet Transform,Chinese character grid and stroke density features.4.Extracting the features of useful numeric characters from the invoice.Firstly,conduct image binaryzation processing on.effective numeric character of single extracted invoice,then refine them on binary image to extract frameworks of numeric character,and finally extract corresponding architectural features of numeric character by means of refining framework of binary image.5.The recognition of useful character on machine-printed invoices.After feature extraction,establish disaggregated model to conduct forecast through Support Vector Machine(SVM)so as to obtain precision of classification.This thesis also establishes an invoice effective character recognition system.Type single invoice effective character into the system,then transform the character image to character text and then come to display through the above recognition method.
Keywords/Search Tags:GLCM, SVM, feature extraction, machine-printed invoices, charac-ter recognition
PDF Full Text Request
Related items