Utilizing big data in identification and correction of OCR errors

Posted on:2014-02-13

Degree:M.S.C.S

Type:Thesis

University:University of Nevada, Las Vegas

Candidate:Agarwal, Shivam

Full Text:PDF

GTID:2450390008461015

Subject:Computer Science

Abstract/Summary:

In this thesis, we report on our experiments for detection and correction of OCR errors with web data. More specifically, we utilize Google search to access the big data resources available to identify possible candidates for correction. We then use a combination of the Longest Common Subsequences (LCS) and Bayesian estimates to automatically pick the proper candidate.;Our experimental results on a small set of historical newspaper data show a recall and precision of 51% and 100%, respectively. The work in this thesis further provides a detailed classification and analysis of all errors. In particular, we point out the shortcomings of our approach in its ability to suggest proper candidates to correct the remaining errors.

Keywords/Search Tags:

Errors, Data, Correction

Related items

1	Research On BDS Medium/Long-range Quickly Ambiguity Resolution And Regional Error Correction
2	Research On Lossless Compression Of High-throughput Genome Data
3	Study On Analysis And Correction Algorithms Of Nonlinear Errors In Phase-shifting Interferometry
4	Statistical Analysis And Variational Data Assimilation Experiments For Observation Errors Of FY-3 Microwave Observation
5	Beyond FIT2D: Calculating Intensity Errors for Data Analysis of X-Ray Synchrotron Powder Diffraction Data
6	Analysis And Error Correction Of Lighting Location Data In Hu Bao And Area
7	Study On The Errors Correction And Ocean-land Echo Waveforms Processing For HY-2A Radar Altimeter
8	Research Of The Atmospheric Refraction Errors Correction On The Neural Network In Photo-electricity Survey Information
9	Coefficient Functions Estimation And Property In Varying-Coefficient Errors-in-Variables Models With Missing Response Variables
10	Analysis And Correction Of Errors In Precipitation Measurement In Xinjiang