Font Size: a A A

The Two-Step Multi-Model Paper Citation Matching Algorithm

Posted on:2022-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:W L ChenFull Text:PDF
GTID:2518306524989499Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Scientific research advocated by scientists has become the main driving force of in-novation in modern society.A large number of papers and articles are growing vigorously in various fields and directions.The new inventions and new theories discussed in each article,and the influential articles usually become highly cited articles.Many papers can be used to contribute to future research and brainstorm,and their novel output also allows us to speculate about the development of scientific research in the future.I hope to de-velop a set of algorithms that can automatically understand and identify the cited papers corresponding to the description,which can not only help reduce the author's query time when writing papers,but also deepen the understanding of scientific research context,and make progress in the fields of scientific research knowledge mapping,automatic question answering system and automatic summary system.Based on information retrieval,this thesis proposes a citation matching algorithm based on two gait multi model.The paper citation matching algorithm consists of two parts,one is the paper citation matching recall algorithm based on the fusion of text recall algorithm,and the other is the paper citation matching algorithm based on tree and pre training model.In the recall side,we propose a recall strategy to solve the problem of large-scale citation screening,using the weighted boosting algorithm of word vector and the weighted bag of gram algorithm to achieve accurate and fast recall.In the paper citation matching algorithm based on tree and pre-training model,the optimized pre training model in specific fields is used to match the paper citations.At the same time,a set of gradient boosting decision tree algorithm based on the paper citation feature framework is developed without the help of external data.In the final stage,the integrated learning is done by using the differences of the models,and the two models are modeled It's a fusion.The algorithm also won the first place of WSDM cup 2020.
Keywords/Search Tags:Natural language processing, text matching, decision tree model, recall
PDF Full Text Request
Related items