There are a lot of Word documents in the Internet, but only small part of them contains mathematical formulas. Identifying whether the documents contain mathematical formulas and converting the different formats of mathematical formulas that extracted in the documents into a unique format are the premise and foundation for mathematical formulas retrieval.Facing to the need of retrieving the mathematical formulas, an extraction method for mathematical formulas in the Word documents is employing in this paper. Firstly, the development status of the current document parsing is introduced. Then, the mathematical formulas in Word documents are detected through the method based on XML and OLE objects. Finally, according to the custom matching rules, the mathematical formulas in OMML format are converted to and stored in the La Te X format. What’s more, the mathematical formulas edited by Math Type are converted to images for mathematical formulas recognition function to convert these formulas images to La Te X format. The experiment shows that the method is effective for the extraction and conversion of mathematical formulas in Word documents, which also has an application value for the index construction and retrieval of mathematical formulas. |