Font Size: a A A

Research And Implementation Of Optical Character Recognition For Insurance Claims Material

Posted on:2019-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z M YeFull Text:PDF
GTID:2428330566986777Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of the economy,the competition among the insurance industry becomes more and more fierce.It has been difficult to satisfy the market demand by relying on traditional means to expand the market.In order to enhance its competitiveness,the investment to the information technology by the insurance industry has continuously increased.The claims departments of insurance companies need to manually enter,analyze,and classify a large number of claims documents.The repeated work increases the operating costs of the insurance company with low efficiency and high error rate.In the current insurance industry,the entry of insurance policies can be achieved with the help of optical character recognition.This is also owning to the new format of the insurance policy,high-definition,leading to satisfactory recognition.However,the automatic entry hasn't been applied to claim documents(the list of medications),as medication lists come from different medical institutions and the detailed information,such as drugs and medical items,are expressed differently without uniform standard.Moreover,the documents are not clear enough to meet the quality needs of data input to the system just by scanning.Manual intervention is required during the aforementioned process.This article is to solve the problem regarding automatic entry of the medication list through the image pretreatment of medical documents(dosing list),combined with optical character recognition technology,so as to provide medication list identification function for different systems.The main work is as follows,1.This article adds image preprocessing operations prior to recognition to reduce the effects of lighting,stamp,and tilt,so that image recognition increases to 80% or more.2.This article uses the open source Tesseract as a recognition tool to expand its identification sample.By training Tesseract through machine learning,it will narrow down the issues caused by insufficient training for Chinese samples in the original recognition library,and improve Tesseract's identification ability to drug lists.3.Given the unsatisfactory identification result of the medication list by image preprocessing and optical character recognition,this article will adopt the method of checking dictionaries after comparing applicability,advantages and disadvantages between the method of comparative probability,N-Gram method and the method of checking dictionaries.The wrong terminology caused by the image characters will be corrected by matching recognitionresults with the method of checking dictionaries,which will hereby increase the recognition rate to 90% or more.Finally,this article will analyze the recognition results and experiments and illustrate the issues encountered during the process of the research and the deficiencies of this method and some possible improvement in the future.
Keywords/Search Tags:List of drugs, machine learning, optical character recognition, Tesseract
PDF Full Text Request
Related items