Font Size: a A A

Research On Math Query Language And Index In Web-based Math Search

Posted on:2010-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:K JingFull Text:PDF
GTID:2178360275495550Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the internet and the fast increase of the Web information presently, it is difficult for the users to search their required information from the tremendous information resource, as finding a needle in a haystack. Fortunately the problem could be resolved by search engine technology. However, because traditional Text retrieval system has significant limitations on the search of mathematical formulas and mathematical symbols, it cannot satisfy the user demand for mathematical formulas search in many aspects such as science, mathematics, engineering and technology and so on. Meanwhile, with the enhancement of computer storage on mathematical Content, as well as the deepening support of a variety of browsers in the form of mathematics, it is feasible to carry out the research on the search engine about mathematical formula.In this paper, based on a comparative study of some existing mathematical formula search system on implementation and techniques, we mainly focus on the two important and difficult problems on mathematical formula search. In other words, we pay great attention to how to establish a common, powerful query language of mathematics and how to construct a mathematical Content index structure which is easy to be stored and inquired.With respect to establishment of mathematical query language, we propose a mathematical query language (Math Query Language, MQL) which is based on extension of XML and accords with the MathML specification. The query language implements the wildcard query expression and the combination query expression by defining a series of meta-data labels which is based on MathML specification. These labels have their attributes which can be used to refine inquiry description and enhance the effective of query expression.As regards construction of the mathematical Content index, in order to support simultaneously the Presentation query and semantic query of mathematical formula, we establish both the Content-based Index and the Presentation-based Index for mathematical Content. The Content-based Index mainly uses the abstract-tree inverted index structure, while Presentation-based Index mainly uses linear N-grams inverted index structure. In addition, the paper also describes the weight evaluation method for each Sub-formula during the index establishment of a formula. The method can be used to optimize query results and improves recall ratio and correlation of the search engine.
Keywords/Search Tags:MathML, Math query language, Semantic query, Index, Abstract tree, Inverted Table
PDF Full Text Request
Related items