Font Size: a A A

Design And Implement Of Mathematical Expression Retrieval System

Posted on:2015-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiFull Text:PDF
GTID:2298330422971980Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The development of internet has brought great convenience for the emergenceand spread of information.With the help of search engine, it’s easier for people to searchinformation on the internet. Different kinds of information can be obtained through thesearch engine.While in the particular field for the search of mathematical expression,the current search engine based on text retrieval technology can’t input and retrievaleffectively.How to retrieve mathematical expression like plain text is one of theproblems that need to be solved in information retrieval.This thesis builds a mathematical expression retrieval prototype systemFormulaSearch. It based on the mature text retrieval technology and mathematicaldescription language MathML. And full-text retrieval framework Lucene is alsoextended to adapt to mathematical expression retrieval. The main contents are asfollows:1) The thesis analyzes the general mathematical expression retrieval model andtypical mathematical expression retrieval system, And then propose the architecture ofFormulaSearch. The FormulaSearch include input module,preprocessing module,indexmodule and retrieval module.In input module,this thesis mainly deal with the input ofmathematical expression and the convert of mathematical expression to MathMLcode.In preprocessing module,this thesis mainly deal with the extraction andsegmentation of mathematical expression.In index module,this thesis mainly deal withthe construction of inverted index.In retrieval module,this thesis mainly deal with theretrieval of mathematical expression,sorting and highlight of the result.2) In order to get a better retrieval effect using Lucene for mathematical which ishighly structured,this thesis redesigned Lucene analyzer for mathematicalexpression.The design is based on DOM tree structure based on MathML document,andeach subtree is regarded as a segmentation result of mathematical expression.Thesegmentation result of mathematical expression through the analysis of DOM tree.3) For the practical need of retrieve system,this thesis realize a simplemathematical expression editor using Silverlight technology. The editor can achieve theinput of basic mathematical expression,which contains operator such as “+”、“-”、“*”、“%”etc.4) In order to test the particular retrieval result of FormularSearch,this thesis test on system response time,recall,precision and F-Measure using200mathematicalexpression of junior middle school as test data.The experiment result show that thepractical retrieval performance well.
Keywords/Search Tags:Mathematical expression, mathematical expression retrieval, searchengine, analyzer
PDF Full Text Request
Related items