Font Size: a A A

The Description And Retrieval Of Math Formulas In Scientific Documents

Posted on:2008-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:T LuFull Text:PDF
GTID:2178360272468314Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Math Formula brings much effectiveness in expressing ideas in a definite way. As to its construct, the layout is nonlinear, which is the main advantage to the more difficulties arising from describing and displaying math compared with that of a single character or a string. As to its content, the meaning is clear, more precise comparing with that of pure texts. Due to the excellent characteristics in expressing scientific information, math formula has taken great advantages in science documenting areas. The tremendous math formulas in scientific documents and the easy access of web have fired the desire of people to retrieve those formulas. The current retrieval system, however, fails to search formulas.ScienceML is designed to provide information on both layout for enabling math formula to be shown on web and content for math retrieval. A prototype of MRS (Math Retrieval System) based on B/S has been established to enable math formula to be retrieved more effectively. MRS is composed of four components, which are scientific document and math crawler, retrieval database design, math index construction and math query. To enrich the source of math formula, a web robot is implemented to download and extract formulas from the Internet. Scientific Document Database, Math Database and Math Index Database are created respectively. Math ID is provided to bridge different databases used in the searching process. The use of math ID not only reduces the storage volume, but also speeds up the query process. On the analysis of the strict logic between items of math formula, a layer-based index strategy is adopted to improve the search efficiency and measure the similarity. The layer-based indexing method can abstract the key feature of math, which makes the operators as its core. The sub-structure and semantic query are introduced to improve the accuracy of query.The B/S based MRS is convenient to access through any web browsers. According to statistics from system testing, MRS has got a high score.
Keywords/Search Tags:Math Retrieval, Math Index, Layer-based Index, Web Robot
PDF Full Text Request
Related items