| With the development of the information age, the amount of information is continuouslyexpanding. How to quickly access information, is one of the urgent needs of today’s society.In view of this, in this paper we presented some PDF images pre-processing work for formularecognition, and laid the foundation for PDF formula recognition and retrieval in the future.Firstly, parsed the PDF documents and extracted PDF images; Secondly, according to theimages width and height to filter the small images, according color information of true colorimages, histogram information of gray images and black pixels of binary images to classify,and obtained text images for character recognition. Finally, pre-processed the PDF imagesand formula regions, such as according to stroke width to determine and enhance thelow-resolution images, according to circle of the special characters locate formula regions,according to angle of the longest symbol straight line to correct formula regions andaccording to the formula’s characteristices to denoising. The experiments showed that thesepre-processing can help to improve the recognition accuracy of PDF image formula. |