Font Size: a A A

The OCR Research In ROC Newspapers And Periodicals Digitization

Posted on:2009-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:J YanFull Text:PDF
GTID:2178360245973953Subject:Information Science
Abstract/Summary:PDF Full Text Request
As an important part of historical literatures, newspapers and periodicals in the period of Republic of China have their own position whether in their content or form. In the last few years, as digitization has gradually moved from an experimental and temporal activity towards one that is structural and continuous, mass digitization projects have been gaining ground. Almost simultaneously with the 'coming-of-age' of digitization, an increasing number of large-scale newspaper and periodicals digitization projects have emerged. Because these literatures appeal to a large audience and in many cases remain inaccessible to a large degree, it is no surprise that many institutions decide to digitize their newspaper collections first. Digitization and web delivery makes these collections available to a worldwide audience. Optical character recognition (OCR) technology - in spite of its shortcomings - offers better search-and-retrieval functionality than has been possible before. The demand has grown for companies capable of providing digitization services for newspaper and periodicals collections. A new market is evolving, with rapidly changing customer needs and company solutions.Whether take which solutions, scanning and character recognition is still important steps. Although OCR technology has a large development, there are many kinds of software in market; the articles discussing interface design still few. How to enhance the process speed is a worthiness subject. This article display the development , advantages, feasibility of OCR first. After that, discuss the technical index of scanning, sum up the best one. The next step accomplishes a common research of using of the OCR, acquiring some strategies. Then give an example of interface design. At last, look back the main work during the writing time and look forward the following jobs.The other subject of this article is the testing work. More than 100,000 characters test reflects the measures would increase the recognition capacity and speed especially for the low quality text. The accurate percentage is about 90% which increase 9 percentages than before. At the same time operating time has decreased too.The innovation of this article is put forward relative measures which aim at the text practicality especially the interface design. All of these will have a positive meaning for popularizing the OCR software and instructing our country' s literature digitization.
Keywords/Search Tags:OCR software, interface design, newspapers and periodicals digitization, scanning
PDF Full Text Request
Related items