Research And Implementation Of Optical Character Recognition For Insurance Claims Material

Posted on:2019-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:Z M Ye

Full Text:PDF

GTID:2428330566986777

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the development of the economy,the competition among the insurance industry becomes more and more fierce.It has been difficult to satisfy the market demand by relying on traditional means to expand the market.In order to enhance its competitiveness,the investment to the information technology by the insurance industry has continuously increased.The claims departments of insurance companies need to manually enter,analyze,and classify a large number of claims documents.The repeated work increases the operating costs of the insurance company with low efficiency and high error rate.In the current insurance industry,the entry of insurance policies can be achieved with the help of optical character recognition.This is also owning to the new format of the insurance policy,high-definition,leading to satisfactory recognition.However,the automatic entry hasn't been applied to claim documents(the list of medications),as medication lists come from different medical institutions and the detailed information,such as drugs and medical items,are expressed differently without uniform standard.Moreover,the documents are not clear enough to meet the quality needs of data input to the system just by scanning.Manual intervention is required during the aforementioned process.This article is to solve the problem regarding automatic entry of the medication list through the image pretreatment of medical documents(dosing list),combined with optical character recognition technology,so as to provide medication list identification function for different systems.The main work is as follows,1.This article adds image preprocessing operations prior to recognition to reduce the effects of lighting,stamp,and tilt,so that image recognition increases to 80% or more.2.This article uses the open source Tesseract as a recognition tool to expand its identification sample.By training Tesseract through machine learning,it will narrow down the issues caused by insufficient training for Chinese samples in the original recognition library,and improve Tesseract's identification ability to drug lists.3.Given the unsatisfactory identification result of the medication list by image preprocessing and optical character recognition,this article will adopt the method of checking dictionaries after comparing applicability,advantages and disadvantages between the method of comparative probability,N-Gram method and the method of checking dictionaries.The wrong terminology caused by the image characters will be corrected by matching recognitionresults with the method of checking dictionaries,which will hereby increase the recognition rate to 90% or more.Finally,this article will analyze the recognition results and experiments and illustrate the issues encountered during the process of the research and the deficiencies of this method and some possible improvement in the future.

Keywords/Search Tags:

List of drugs, machine learning, optical character recognition, Tesseract

PDF Full Text Request

Related items

1	Optical Character Recognition And Application Research Based On Machine Learning
2	Research On Character Recognition Of Vectorgraph Based On Deep Learning
3	Research And Implementation Of Character Recognition System Based On Tesseract
4	Research And Implementation Of Uyghur-Chinese Translation Software Based On Optical Character Recognition
5	The Research Of Optical Character Recognition Orient Digital Resource Aggregation Platform
6	Research On Tesseract＿OCR Based Text Recognition System
7	Research On Character Recognition Based On Tesseract
8	Character Recognition Method Based On Active Learning Svm
9	Design And Implementation Of OCR Application Based On Android
10	Machine Printed Character Recognition System Using Feature Point Extraction And BP Neural Network Classifier