Font Size: a A A

An English Scientific Document Retrieval Method Based On Formula Description Structure And Word Embedding

Posted on:2021-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZaiFull Text:PDF
GTID:2428330620970581Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the repaid development of technical and technology,the requirement for technical information exchange is increasing urgently and diversified.According to the feature that technical document is substantial in formal contents,the conventional full-text retrieval method cannot meet the actual requirements of researchers,the implement of obtaining technical information based on the formal content of technical documents such as mathematical expressions is a problem that should be solved urgently.Through the analysis of mathematical formula structure in technical documents and the induction of word semantics in English technical documents,for the actual demand for technical information retrieval,aiming at the problem that the traditional retrieval method based on mathematical formula is difficult to meet the actual needs,an English technical document retrieval model that is adapted to the complex structure of formulas and semantics of words are researched and designed.At the first,starting from two aspects of the query requirements of the technical documents,preprocess the technical documents from mathematical expressions and keywords,the former includes mathematical expression extraction,mathematical expression parsing,and construction of mathematical expression indexes,while the latter uses the automatic keyword extraction algorithm to extract the words in the English technical document,and calculates the weight value of the keywords in the document.Secondly,by using the advantages of formula description structure method in the processing of complex formula structure,eliminate the problem of matching interference caused by general operands,implement technical document retrieval based on mathematical formula.Finally,the distributed representation of neural network–word embedding model from deep learning theory is introduced,and the model is optimized to fit the feature of technological document.By using word embedding model to convert query keyword and document keyword into word vector at the same time,the relevance between words is enhanced,and the expression structural information and word vector based technical document rank is implemented.The proposed retrieval model is tested with 38,165 English technical documents in public dataset NTCIR and the experimental results show that the recall and precision are 0.77 and 0.63 respectively,which has better performance than FDS-only method and demonstrate that the proposed method can satisfy the retrieval requirement of users more effectively in multiaspect.
Keywords/Search Tags:Technical document retrieval, Formula, Semantic correlation, FDS, Word embedding
PDF Full Text Request
Related items