Font Size: a A A

Design And Implementation Of Laboratory Sheet Photo Recognition System

Posted on:2020-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ShaFull Text:PDF
GTID:2392330596492262Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the progress of information technology and the continuous development of related technologies such as image processing and artificial intelligence,related applications based on computer vision have been popularized in daily life.The test sheets photo identification and inspection results analysis system can provide users with a comprehensive interpretation of the test sheet,avoiding the problem that the users queue to view the test sheets for a long time and the doctor does not elaborate carefully.Optical Character Recognition(OCR)technology facilitates the electronicization of paper documents,but the photo recognition of paper test sheets has the characteristics of uneven image illumination,tilting of the layout,professional vocabulary and special symbols,resulting in low recognition rate by the direct use of traditional OCR systems.In this paper,the characteristics of the photo identification problem of the test sheets are analyzed,and the photo recognition scheme is analyzed and designed in detail.Based on this,a convenient interactive website is built.The main work contents are as follows:(1)Image preprocessing and layout analysis: For the characteristics of photographed images,firstly,the image is binarized by local adaptive binarization method,then the image is tilt corrected by Hough transform method,and then the morphological operation is used to remove noise.Finally,the text area block and the non-text area block are extracted by the connected area analysis method.According to the size of the connected area,the noise,ultrasound image and other areas on the test sheet are removed,and then a single text area block is clustered to obtain a text line according to the Single-linkage hierarchical clustering method.(2)Text line recognition: The direct recognition of the tesseract-OCR has a low recognition rate on this problem.This paper uses the manually calibrated training samples to carry out targeted training on the tesseract-OCR,which improves the rate of engine's photo identification in the test list.Then,based on the virtual generated medical-related vocabulary training samples,the CRNN model was retrained and compared with the tesseract-OCR.(3)Interactive system construction: According to the needs analysis of the system design,complete the development of the fore-end and the rear-end,the fore-end realizes the functions of taking pictures and displaying images,uploading images locally,displaying and recognizing the results.In the rear-end,there are implementing image pre-processing,layout analysis,and identification,information storage and other functions.
Keywords/Search Tags:OCR, Laboratory sheet recognition, Layout analysis, Tesseract, CRNN
PDF Full Text Request
Related items