Font Size: a A A

Scientific Document Retrieval Based On Semantic Expansion Of Mathematical Expressions

Posted on:2022-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:S W DongFull Text:PDF
GTID:2518306722970249Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Mathematical expressions are an important part of scientific and technological documents.The spatial structure information of its various symbol arrangements is difficult to obtain by traditional text search methods.The retrieval methods based on mathematical expressions increase the complexity of scientific document retrieval,and furthermore Increased the difficulty of retrieval.Aiming at most of the existing mathematical expression retrieval methods that only focus on the establishment of the index of the expression structure,but ignore the problem of its semantics in the document,this paper proposes an extended query method for scientific and technological documents based on the semantics of mathematical expressions.This method extracts the semantic keywords of mathematical expressions(referred to as mathematical text)from the text,and integrates them into the model of mathematical expression structure retrieval.Firstly,the mathematical text corresponding to the mathematical expression is extracted.This thesis combines SAO structure and dependency syntax to formulate the extraction rules of mathematical text and obtain the candidate set of mathematical text.At the same time,the improved TF-IDF value is combined with the statistical characteristics of syntax and location to filter and obtain the mathematical text;then,the semantic expansion retrieval model of mathematical expression is designed.The operator structure characteristics of two-dimensional expressions are analyzed by formula description structure(FDS),combined with mathematical texts,the index structure between expression structure and semantics is established.Through the secondary retrieval of mathematical text,the mathematical expressions with different structures under the same semantics are extended and queried;finally,the retrieval results of scientific and technological documents are output.By calculating the similarity between the mathematical text corresponding to the result expression and the subject of the scientific document,as the basis for judging the importance of the expression in the scientific document,the scientific document where the result expression is located is output orderly according to the similarity,so as to realize the scientific document retrieval based on the semantic expansion of the mathematical expression.Based on the above research ideas,this paper conducts experiments on the public data set NTCIR?12?mathir?wikipedia?corpus for mathematical information retri eval.The results show that the mathematical text extraction method combining dependent syntax and statistical features is more suitable for the meaning of mathematical expressions,and the F measurement value is 0.47 higher than the traditional method.At the same time,in terms of recall,the introduction of mathematical expression semantic retrieval is 0.13 higher than the retrieval method that only considers the structure of mathematical expressions,and the result ranking is slightly higher than the Wiki Mirs3.0 method.Therefore,the experimental results show that the method proposed in this paper helps to improve the performance of expression retrieval,not only realizes the expansion of the query,but also better fits the subject of scientific and technological documents.
Keywords/Search Tags:Mathematical expression, Extended query, Expression semantics, Dependency syntax, FDS, Word embedding
PDF Full Text Request
Related items