Font Size: a A A

The Extraction Of Mathematical Formulas In Word Documents For Math Retrieval

Posted on:2016-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:W X XuFull Text:PDF
GTID:2308330479476942Subject:Computer technology
Abstract/Summary:PDF Full Text Request
There are a lot of Word documents in the Internet, but only small part of them contains mathematical formulas. Identifying whether the documents contain mathematical formulas and converting the different formats of mathematical formulas that extracted in the documents into a unique format are the premise and foundation for mathematical formulas retrieval.Facing to the need of retrieving the mathematical formulas, an extraction method for mathematical formulas in the Word documents is employing in this paper. Firstly, the development status of the current document parsing is introduced. Then, the mathematical formulas in Word documents are detected through the method based on XML and OLE objects. Finally, according to the custom matching rules, the mathematical formulas in OMML format are converted to and stored in the La Te X format. What’s more, the mathematical formulas edited by Math Type are converted to images for mathematical formulas recognition function to convert these formulas images to La Te X format. The experiment shows that the method is effective for the extraction and conversion of mathematical formulas in Word documents, which also has an application value for the index construction and retrieval of mathematical formulas.
Keywords/Search Tags:Mathematical formula retrieval, Word document, Mathematical formula extraction, Format conversion, LaTeX
PDF Full Text Request
Related items