Font Size: a A A

N-gram Index Structure For Semantic Based Mathematical Formulas

Posted on:2016-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y X XuFull Text:PDF
GTID:2308330461467277Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, mathematical formula retrieval has become a popular research problem. Mathematical formulas have complex structures and rich semantics. In addition, the query of sub-formulas and formula types & common calculation is also a meaningful research for mathematical formula search. However the semantics of mathematical formulas are not considered in the current mathematical search systems of both domestic and foreign institutions. Furthermore no systems have been found to implement the retrieval of sub-formulas, equivalent formulas and relevant formulas.The paper firstly presents an N-grams division method of mathematical formula, determines the granularity of division by experiments, and proposes calculating method of a sub-formula weight. The experiments show that the methods of N-grams division and index construction have great help for sub-formula matching and weighting computation.Secondly, the paper gives definition of equivalent formula and relevant formula, they may highly relevant to the query formula, and they should be key factors of sort in mathematics search. This paper studies the common types of mathematical formulas by Wolfram Alpha and classifies formula types into three levels through combing the features of mathematical formulas and user search intent. There are 27 types of first-level,50 types of secondary-level and 77 types of third-level in the classification results. This paper also statistics common calculation of 250 formulas, and finds out influenced factors of each common calculation. Finally the paper presents the common calculation results for user through analyzing these factors.Meanwhile, the mathematical formula storage structures and processes are designed and analyzed in the paper. By this method we can achieve equivalent formula, related formula and sub-formula search, as well as the organic unity of the stored data and query data.To sum up, this paper constructs a semantic-based mathematical formula N-grams indexing mechanism. Through the method our system can achieve the search of sub-formulas, equivalent formulas and relevant formulas. It also improves the recall and precision of mathematical formulas. The quick and feasible semantic based method can meet the needs of different users and greatly enhance the semantic search efficiency of mathematical search.
Keywords/Search Tags:Search engine, Math Search, Formula search, N-grams division, Sub-formula weight, common calculation
PDF Full Text Request
Related items