Font Size: a A A

Acquisition Of Mathematical Expressions Information In PDF Documents Based On Document Properties

Posted on:2016-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:B T YuFull Text:PDF
GTID:2308330479976941Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Information acquisition from PDF documents is becoming a hotspot because of the popularity of PDF documents. PDF is page description oriented and contents in the documents usually don’t have clear logical relationships, which make it difficult to extract information from them. To meet the need of mathematical expression retrieval, we have studied the method of the acquisition of mathematical expression information from PDF documents generated from texts based on the document properties, which benefits to the development of the retrieval of PDF documents.First, the bounding boxes of the characters are extracted directly from font files of PDF documents and then text lines are extracted according to the display points of the characters. By analyzing text showing instructions, words in lines are segmented. Finally, the mathematical expressions are identified by using rule based methods and the structure of the mathematical expression could be built based on the expression typesetting order. The research method takes advantages of the document properties to extract mathematical expression information, which is more targeted and adaptive to type-specific PDF documents.
Keywords/Search Tags:PDF, documents, Mathematical expression, Information acquisition, Font, Identification, Structure
PDF Full Text Request
Related items